Counting unique values in a column in pandas dataframe like in Qlik?


Counting Unique Values in a Pandas DataFrame like in Qlik
Hey there tech enthusiasts! 👋 Are you struggling to count unique values in a column of a pandas DataFrame, just like in Qlik? Look no further! In this blog post, we will explore a common issue that many developers face and provide easy solutions using Python's pandas library. So, let's dive right in! 💪
The Scenario
Imagine you have a pandas DataFrame like this:
df = pd.DataFrame({
'hID': [101, 102, 103, 101, 102, 104, 105, 101],
'dID': [10, 11, 12, 10, 11, 10, 12, 10],
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
Qlik users are familiar with the count(distinct hID)
function, which would return a count of 5 for unique hID
values in Qlik. But how can you achieve the same result using a Python pandas DataFrame or a numpy array? Let's find out! 💡
Counting Unique Values in pandas
Solution 1: Using the nunique()
Function
The simplest way to count unique values in a pandas DataFrame column is by using the nunique()
function. The nunique()
function returns the number of unique elements in a column.
To count unique hID
values in the DataFrame df
, you can simply do:
unique_hID_count = df['hID'].nunique()
print(unique_hID_count)
Output:
5
Isn't that super easy? You get the same count of 5, just like in Qlik! 🎉
Solution 2: Using the drop_duplicates()
Function
Another approach to counting unique values is by using the drop_duplicates()
function. This function removes duplicate rows from the DataFrame, and you can then count the remaining rows.
To count unique hID
values using this method, follow these steps:
# Drop duplicate rows based on 'hID'
unique_df = df.drop_duplicates(subset='hID')
# Count the number of remaining rows
unique_hID_count = unique_df.shape[0]
print(unique_hID_count)
Output:
5
Amazing, right? This method achieves the same count of 5 as well! 🎉
Solution 3: Utilizing a numpy array
If you prefer working with numpy arrays, you can convert the pandas column to a numpy array and then utilize numpy's unique()
function.
To count unique hID
values using a numpy array, you can try this code:
import numpy as np
unique_hID_count = np.unique(df['hID']).size
print(unique_hID_count)
Output:
5
It's as simple as that! You get the same count of 5 using numpy too! 🎉
Counting Total Values
Now that we've covered counting unique values, let's talk about counting total occurrences of a value, similar to count(hID)
in Qlik.
To count the total occurrences of hID
, you can use the value_counts()
function:
hID_count = df['hID'].value_counts().sum()
print(hID_count)
Output:
8
Voila! You get the desired count of 8, just like in Qlik! 🎉
Call to Action
You've made it to the end! 🙌 We hope this guide helped you understand how to count unique values in a pandas DataFrame, just like in Qlik. If you found it helpful, why not share it with your fellow developers and spread the knowledge? 🔗 Remember, sharing is caring!
Got any more pandas or general tech-related questions? Feel free to leave a comment below, and let's engage in a tech talk! 💬🔥
Take Your Tech Career to the Next Level
Our application tracking tool helps you manage your job search effectively. Stay organized, track your progress, and land your dream tech job faster.
