How to iterate over rows in a DataFrame in Pandas
🚀 How to Iterate Over Rows in a DataFrame in Pandas 🐼
So, you have a pandas dataframe that looks something like this:
c1 c2
0 10 100
1 11 110
2 12 120
And your goal is to iterate over each row and access the values in each cell by the column name. For example, you want to print out the values of c1
and c2
for each row.
Let's start by exploring two commonly used techniques for iterating over rows in a DataFrame: df.iterrows()
and df.itertuples()
.
Using df.iterrows()
The df.iterrows()
function allows you to iterate over the rows of a DataFrame and get a tuple containing the index of the row and the row itself.
for index, row in df.iterrows():
print(row['c1'], row['c2'])
In this code snippet, index
represents the index of the row, and row
is a pandas Series object that contains the values of each cell in that row. You can access the values of specific columns using their names, just like the example above.
However, it's important to note that df.iterrows()
returns a copy of each row, not a view. This means that modifying the values of the row inside the loop won't affect the original DataFrame.
Using df.itertuples()
If you're looking for a more efficient way to iterate over rows, you can use the df.itertuples()
function. This function returns a named tuple for each row, where the column names are accessible as attributes.
for row in df.itertuples(index=False):
print(row.c1, row.c2)
By setting index=False
, we exclude the index column from the returned tuples. Like df.iterrows()
, you can access the values of each cell by using the column names as attributes of the tuple.
The advantage of using df.itertuples()
over df.iterrows()
is that it is significantly faster for large datasets.
Example Use Case
Let's say you want to calculate the sum of the values in column c1
for each row. Using df.iterrows()
, you could do it like this:
sum_c1 = 0
for index, row in df.iterrows():
sum_c1 += row['c1']
print("The sum of c1 is:", sum_c1)
And using df.itertuples()
:
sum_c1 = 0
for row in df.itertuples(index=False):
sum_c1 += row.c1
print("The sum of c1 is:", sum_c1)
Summary
To iterate over rows in a DataFrame and access the values of cells by column name, you can use either
df.iterrows()
ordf.itertuples()
.df.iterrows()
returns a tuple containing the index of the row and a copy of the row as a Series object.df.itertuples()
returns a named tuple for each row, where the column names are accessible as attributes.Use
df.itertuples()
for better performance with large datasets.
I hope this guide helped you in understanding how to iterate over rows in a DataFrame in Pandas! If you have any questions or suggestions, feel free to leave a comment below. Happy coding! 😄🐼
💡 Did you find this guide helpful? Share it with your friends and colleagues who might find it useful too! And don't forget to follow our blog for more exciting tech tips and tricks! 🌟
🔗 Follow us on Twitter to stay up-to-date with the latest tech news and trends: @techbloghandle
📢 Join the conversation! Share your thoughts and experiences with iterating over rows in a DataFrame in the comments below. Let's learn from each other! 👇