Counting unique values in a column in pandas dataframe like in Qlik?
Counting Unique Values in a Pandas DataFrame like in Qlik
Hey there tech enthusiasts! 👋 Are you struggling to count unique values in a column of a pandas DataFrame, just like in Qlik? Look no further! In this blog post, we will explore a common issue that many developers face and provide easy solutions using Python's pandas library. So, let's dive right in! 💪
The Scenario
Imagine you have a pandas DataFrame like this:
df = pd.DataFrame({
'hID': [101, 102, 103, 101, 102, 104, 105, 101],
'dID': [10, 11, 12, 10, 11, 10, 12, 10],
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
Qlik users are familiar with the count(distinct hID)
function, which would return a count of 5 for unique hID
values in Qlik. But how can you achieve the same result using a Python pandas DataFrame or a numpy array? Let's find out! 💡
Counting Unique Values in pandas
Solution 1: Using the nunique()
Function
The simplest way to count unique values in a pandas DataFrame column is by using the nunique()
function. The nunique()
function returns the number of unique elements in a column.
To count unique hID
values in the DataFrame df
, you can simply do:
unique_hID_count = df['hID'].nunique()
print(unique_hID_count)
Output:
5
Isn't that super easy? You get the same count of 5, just like in Qlik! 🎉
Solution 2: Using the drop_duplicates()
Function
Another approach to counting unique values is by using the drop_duplicates()
function. This function removes duplicate rows from the DataFrame, and you can then count the remaining rows.
To count unique hID
values using this method, follow these steps:
# Drop duplicate rows based on 'hID'
unique_df = df.drop_duplicates(subset='hID')
# Count the number of remaining rows
unique_hID_count = unique_df.shape[0]
print(unique_hID_count)
Output:
5
Amazing, right? This method achieves the same count of 5 as well! 🎉
Solution 3: Utilizing a numpy array
If you prefer working with numpy arrays, you can convert the pandas column to a numpy array and then utilize numpy's unique()
function.
To count unique hID
values using a numpy array, you can try this code:
import numpy as np
unique_hID_count = np.unique(df['hID']).size
print(unique_hID_count)
Output:
5
It's as simple as that! You get the same count of 5 using numpy too! 🎉
Counting Total Values
Now that we've covered counting unique values, let's talk about counting total occurrences of a value, similar to count(hID)
in Qlik.
To count the total occurrences of hID
, you can use the value_counts()
function:
hID_count = df['hID'].value_counts().sum()
print(hID_count)
Output:
8
Voila! You get the desired count of 8, just like in Qlik! 🎉
Call to Action
You've made it to the end! 🙌 We hope this guide helped you understand how to count unique values in a pandas DataFrame, just like in Qlik. If you found it helpful, why not share it with your fellow developers and spread the knowledge? 🔗 Remember, sharing is caring!
Got any more pandas or general tech-related questions? Feel free to leave a comment below, and let's engage in a tech talk! 💬🔥