How to filter Pandas dataframe using "in" and "not in" like in SQL



Filtering Pandas DataFrame Using 'in' and 'not in' like in SQL: A Quick Guide 👩💻
So, you're working with a Pandas DataFrame and wondering how to filter it using the 'in' and 'not in' operators, just like you would in SQL. Look no further! In this guide, we'll address this common issue and provide you with easy solutions. 🚀
The Scenario 📚
Before we dive into the solutions, let's set the stage with a scenario. Assume you have a DataFrame called df
with a column named 'country'. You also have a list called countries_to_keep
that contains the countries you want to retain in your filtered DataFrame.
Let's take a look at the sample code provided in the context:
df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']})
countries_to_keep = ['UK', 'China']
# pseudo-code:
df[df['country'] not in countries_to_keep]
The Problem 🤔
The pseudo-code provided above might seem like the intuitive way to filter the DataFrame using 'not in' in Python. However, it's not correct syntax for filtering based on multiple values. You'll likely encounter an error if you try to execute the code.
Solution 1: Using the isin()
Method ✅
Pandas provides an elegant solution for filtering DataFrames using the 'in' operator. You can achieve it by using the isin()
method. Here's how you can apply it to your specific scenario:
df[df['country'].isin(countries_to_keep)]
By using the isin()
method in the above code snippet, you can filter the DataFrame based on whether the 'country' values are present in the 'countries_to_keep' list. It will return a filtered DataFrame with only the rows where the 'country' falls within the desired countries list.
Solution 2: Using ~
Operator for 'not in' Filtering 🚫
For filtering based on the 'not in' operator, Pandas allows us to use the tilde (~
) operator. Here's how you can implement it:
df[~df['country'].isin(countries_to_keep)]
In the code snippet above, the ~
operator negates the result of df['country'].isin(countries_to_keep)
. This will return a filtered DataFrame with rows where the 'country' values are not present in the 'countries_to_keep' list.
A Better Way? 😄
In the quoted context, an alternative approach was suggested using DataFrame merging. While this approach might work, it can sometimes feel a bit convoluted and not as straightforward as the previous solutions we discussed.
Conclusion and Call to Action 🎉
You've learned two simple and effective ways to filter a Pandas DataFrame using 'in' and 'not in' operators, just like in SQL. Now you can slice and dice your data based on the values you want to include or exclude.
Try these solutions for yourself and let us know which approach works best in your case. If you have any other tips or tricks to share, or if you encountered any issues while implementing these solutions, drop us a comment below. We'd love to hear from you! 👇
Happy coding! 💻✨