Jesper Dramsch , , , read in 2 minutes

Pandas borrows a core concept from SQL:

the Join

But there are so many different types of joining two DataFrames

Let's make this easy and go through Left, Right, Inner and Outer joins

Knowing these is great in interviews in addition to your usual DataFrame shenanigans

Joins always happen on two DataFrames.

This explanation will use: 🟩🟪⬛

  • 🟩: The left DataFrame
  • 🟪: The right DataFrame
  • ⬛: The result

Don't get the word "join" wrong though, you can actually end up with a smaller DataFrame ⬛ than either or 🟩🟪

⬅️ The Left Join is selfish

This one takes the complete left DataFrame 🟩 and only checks for overlaps from the right 🟪

No 🟪 from outside of the bounds of 🟩 will make it into ⬛

➡️ The Right Join is almost the same as Left

Only take everything in 🟪 and overlapping 🟩

⤵️ The Inner Join

This one is tricky.

Almost always ⬛ will be smaller than 🟩&🟪.

For the Inner join, you only look at the parts of 🟩 and 🟪 that overlap.

Nothing is included in ⬛ that exists outside of this common area.

↔️ The Outer Join

Is possibly the simples one.

It is exactly what we would expect from a "join".

Take all of 🟩 and all of 🟪 and combine it into ⬛.

All of the data is in our result.

TL;DR

  • Pandas borrows from SQL using Joins
  • Left and Right join maintain the original and whatever overlaps in the other
  • Inner is only the common ground
  • Outer uses all the data in both DataFrames