Jesper Dramsch , , , read in 2 minutes

Pandas borrows a core concept from SQL:

the Join

But there are so many different types of joining two DataFrames

Let's make this easy and go through Left, Right, Inner and Outer joins

Knowing these is great in interviews in addition to your usual DataFrame shenanigans

Joins always happen on two DataFrames.

This explanation will use: ๐ŸŸฉ๐ŸŸชโฌ›

  • ๐ŸŸฉ: The left DataFrame
  • ๐ŸŸช: The right DataFrame
  • โฌ›: The result

Don't get the word "join" wrong though, you can actually end up with a smaller DataFrame โฌ› than either or ๐ŸŸฉ๐ŸŸช

โฌ…๏ธ The Left Join is selfish

This one takes the complete left DataFrame ๐ŸŸฉ and only checks for overlaps from the right ๐ŸŸช

No ๐ŸŸช from outside of the bounds of ๐ŸŸฉ will make it into โฌ›

โžก๏ธ The Right Join is almost the same as Left

Only take everything in ๐ŸŸช and overlapping ๐ŸŸฉ

โคต๏ธ The Inner Join

This one is tricky.

Almost always โฌ› will be smaller than ๐ŸŸฉ&๐ŸŸช.

For the Inner join, you only look at the parts of ๐ŸŸฉ and ๐ŸŸช that overlap.

Nothing is included in โฌ› that exists outside of this common area.

โ†”๏ธ The Outer Join

Is possibly the simples one.

It is exactly what we would expect from a "join".

Take all of ๐ŸŸฉ and all of ๐ŸŸช and combine it into โฌ›.

All of the data is in our result.

TL;DR

  • Pandas borrows from SQL using Joins
  • Left and Right join maintain the original and whatever overlaps in the other
  • Inner is only the common ground
  • Outer uses all the data in both DataFrames