Pandas "count(distinct)" equivalent

Cover Image for Pandas "count(distinct)" equivalent
Matheus Mello
Matheus Mello
published a few days ago. updated a few hours ago

Pandas 'count(distinct)' equivalent: Easy Solutions for Counting Unique Values

If you're using Pandas as a substitute for databases like Oracle or SQL Server, you may come across the need to count the number of distinct values in a column, just like the count(distinct) function in SQL. In this blog post, we'll explore common issues around this question and provide easy solutions using Pandas.

The Problem

Let's say you have a table loaded in a DataFrame with multiple columns: YEARMONTH, CLIENTCODE, SIZE, and so on. Your goal is to count the number of different clients per year, just like you would in SQL with the following query:

SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;

The expected result would look something like:

201301    5000
201302    13245

Solution: Using Pandas for Counting Unique Values

To achieve the same result in Pandas, you can follow these steps:

Step 1: Group the DataFrame by YEARMONTH

Before counting the distinct values, you need to group the DataFrame by the YEARMONTH column. This can be done using the groupby() function in Pandas.

grouped_df = df.groupby('YEARMONTH')

Step 2: Apply the nunique() function

Now that you have the DataFrame grouped by YEARMONTH, you can apply the nunique() function to the CLIENTCODE column to count the number of distinct values.

result = grouped_df['CLIENTCODE'].nunique()

This will give you a Pandas Series with the distinct count of CLIENTCODE per YEARMONTH.

Step 3: Reset the index (optional)

By default, the resulting Series will have YEARMONTH as the index. If you prefer to have it as a column, you can reset the index using the reset_index() function.

result = result.reset_index()

Step 4: Rename the columns (optional)

To match the expected result shown earlier, you can rename the columns using the rename() function.

result = result.rename(columns={'CLIENTCODE': 'count(distinct CLIENTCODE)'})

Putting It All Together

Here's the complete code block that combines all the steps mentioned above:

grouped_df = df.groupby('YEARMONTH')
result = grouped_df['CLIENTCODE'].nunique().reset_index().rename(columns={'CLIENTCODE': 'count(distinct CLIENTCODE)'})

Now you can print or display the result DataFrame to see the desired output.

Get Counting with Pandas!

By following these simple steps, you can easily achieve the equivalent of count(distinct) functionality in Pandas. Don't let the absence of a dedicated function deter you from working with large datasets and making insightful analyses.

If you found this guide helpful or have any other Pandas-related questions, feel free to drop a comment below. Happy coding with Pandas! 😄🐼


More Stories

Cover Image for How can I echo a newline in a batch file?

How can I echo a newline in a batch file?

updated a few hours ago
batch-filenewlinewindows

🔥 💻 🆒 Title: "Getting a Fresh Start: How to Echo a Newline in a Batch File" Introduction: Hey there, tech enthusiasts! Have you ever found yourself in a sticky situation with your batch file output? We've got your back! In this exciting blog post, we

Matheus Mello
Matheus Mello
Cover Image for How do I run Redis on Windows?

How do I run Redis on Windows?

updated a few hours ago
rediswindows

# Running Redis on Windows: Easy Solutions for Redis Enthusiasts! 🚀 Redis is a powerful and popular in-memory data structure store that offers blazing-fast performance and versatility. However, if you're a Windows user, you might have stumbled upon the c

Matheus Mello
Matheus Mello
Cover Image for Best way to strip punctuation from a string

Best way to strip punctuation from a string

updated a few hours ago
punctuationpythonstring

# The Art of Stripping Punctuation: Simplifying Your Strings 💥✂️ Are you tired of dealing with pesky punctuation marks that cause chaos in your strings? Have no fear, for we have a solution that will strip those buggers away and leave your texts clean an

Matheus Mello
Matheus Mello
Cover Image for Purge or recreate a Ruby on Rails database

Purge or recreate a Ruby on Rails database

updated a few hours ago
rakeruby-on-railsruby-on-rails-3

# Purge or Recreate a Ruby on Rails Database: A Simple Guide 🚀 So, you have a Ruby on Rails database that's full of data, and you're now considering deleting everything and starting from scratch. Should you purge the database or recreate it? 🤔 Well, my

Matheus Mello
Matheus Mello