pandas get rows which are NOT in other dataframe
Getting Rows in Pandas DataFrame that are NOT in Another DataFrame 😎
So, you have two pandas data frames - df1
and df2
- and you want to find the rows in df1
that are not present in df2
. Well, you've come to the right place! In this blog post, we'll explore common issues and provide you with easy solutions to this problem. Let's dive in! 🚀
The Problem and Example Data
To understand the problem better, let's take a look at an example. We have two data frames, df1
and df2
, where df2
is a subset of df1
. Our goal is to extract the rows from df1
that are not present in df2
.
import pandas as pd
df1 = pd.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12, 13, 14]})
df2 = pd.DataFrame(data = {'col1' : [1, 2, 3], 'col2' : [10, 11, 12]})
print("df1:")
print(df1)
print("\ndf2:")
print(df2)
Here's the output:
df1:
col1 col2
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
df2:
col1 col2
0 1 10
1 2 11
2 3 12
The Solution: Using Pandas' isin()
and Boolean Masking
To get the rows in df1
that are not present in df2
, we can leverage Pandas' powerful isin()
function in combination with Boolean masking. Here's how you can do it:
mask = ~df1.isin(df2).all(axis=1)
result = df1[mask]
print("\nExpected result:")
print(result)
And the expected result will be:
col1 col2
3 4 13
4 5 14
Let's break down the solution:
df1.isin(df2)
compares each element indf1
withdf2
element-wise, resulting in a boolean DataFrame indicating whether the elements are present indf2
.all(axis=1)
checks if all values in a row areTrue
. This operation returns a boolean array withTrue
for rows that are present indf2
andFalse
for rows that are not.~
is the negation operator, flippingTrue
toFalse
and vice versa. We use it to obtain a boolean mask that isTrue
for rows not present indf2
andFalse
for rows that are present.Finally, we apply this boolean mask to
df1
usingdf1[mask]
, which gives us the desired result - the rows that are not present indf2
.
Conclusion and Your Turn! 😉
That's it! You've learned an easy and efficient way to extract rows in Pandas DataFrame that are not present in another DataFrame. Feel free to use this technique in your projects!
Now, it's time for you to put your newfound knowledge into action. Try applying this solution to your own data frames and see how it works. Share your experience and any other useful tips in the comments section below! Let's keep learning together! 🌟