Pandas borrows a core concept from SQL:
But there are so many different types of joining two DataFrames
Let's make this easy and go through Left, Right, Inner and Outer joins
Knowing these is great in interviews in addition to your usual DataFrame shenanigans
Joins always happen on two DataFrames.
This explanation will use: 🟩🟪⬛
- 🟩: The left DataFrame
- 🟪: The right DataFrame
- ⬛: The result
Don't get the word "join" wrong though, you can actually end up with a smaller DataFrame ⬛ than either or 🟩🟪
⬅️ The Left Join is selfish
This one takes the complete left DataFrame 🟩 and only checks for overlaps from the right 🟪
No 🟪 from outside of the bounds of 🟩 will make it into ⬛
➡️ The Right Join is almost the same as Left
Only take everything in 🟪 and overlapping 🟩
⤵️ The Inner Join
This one is tricky.
Almost always ⬛ will be smaller than 🟩&🟪.
For the Inner join, you only look at the parts of 🟩 and 🟪 that overlap.
Nothing is included in ⬛ that exists outside of this common area.
↔️ The Outer Join
Is possibly the simples one.
It is exactly what we would expect from a "join".
Take all of 🟩 and all of 🟪 and combine it into ⬛.
All of the data is in our result.
- Pandas borrows from SQL using Joins
- Left and Right join maintain the original and whatever overlaps in the other
- Inner is only the common ground
- Outer uses all the data in both DataFrames