Grouping and Summarizing Data with Pandas (Without Pain)

Grouping and Summarizing Data with Pandas (Without Pain)

Written by: Marlon Colca
Posted on 14 May 2025 - 5 months ago
python pandas analytics

Learn how to group, aggregate, and pivot your data using Pandas. Master `groupby`, `agg`, and `pivot_table` with real examples.


Grouping and Summarizing Data with Pandas (Without Pain) 📊

In this post, we’ll explore how to group and summarize your data with Pandas using methods like:

  • groupby()
  • agg()
  • pivot_table()

These tools help you answer real questions like:

  • What’s the average price per category?
  • Which brand has the most products?
  • How many products are in stock per category?

🧱 Let’s load the data

We’ll use the same dataset (prices_sample.csv):

import pandas as pd

df = pd.read_csv("prices_sample.csv")

🔢 Using groupby() + aggregation

Let’s start simple: what’s the average price per category?

avg_price = df.groupby("category")["price"].mean()
print(avg_price)

You can also get multiple aggregations:

summary = df.groupby("category")["price"].agg(["mean", "min", "max", "count"])
print(summary)

🧠 Custom aggregations with agg()

You can even pass custom functions or lambdas:

df.groupby("brand")["price"].agg(
    average="mean",
    highest="max",
    products="count"
)

You can rename the output columns with named aggregation like above (average=...).


📊 Counting products per category

This is a very common task:

product_counts = df["category"].value_counts()
print(product_counts)

Or using groupby:

df.groupby("category")["product_id"].count()

🔄 Using pivot_table()

Pivot tables give you more flexibility for reshaping data. Let’s say we want the average price per category and availability:

pivot = pd.pivot_table(df, values="price", index="category", columns="in_stock", aggfunc="mean")
print(pivot)

This shows how price changes depending on whether items are in stock or not.


🔁 Reshaping with .unstack() and .melt()

You can convert groupby results into wide format with .unstack():

group = df.groupby(["category", "in_stock"])["price"].mean()
print(group.unstack())

Or go from wide → long format using .melt():

melted = df.melt(id_vars=["category"], value_vars=["price"])
print(melted.head())

🧠 When to use each?

TaskUse
Simple summariesgroupby()
Complex reshapingpivot_table()
Multi-level aggregationsagg()
Format manipulationmelt(), unstack()

✅ Summary

Grouping and aggregating are essential tools in your data toolkit.

Remember:

  • groupby() for grouping rows
  • agg() for multiple summaries
  • pivot_table() for reshaping with aggregations
  • value_counts() for quick category counts

📌 What’s next?

In Part 6, we’ll work with dates and time in Pandas: parsing, grouping by month/week, and time series tricks ⏳.

Perfect if you want to build charts over time or analyze trends.

See you there!


🔜 Coming up next


Grouping and Aggregating Data in Pandas

Grouping and Aggregating Data in Pandas

Learn how to use groupby and aggregation in Pandas to analyze product sales data efficiently.

17 May 2025 - 5 months ago