Exploratory Data Analysis (EDA) with Pandas

Exploratory Data Analysis (EDA) with Pandas

Written by: Marlon Colca
Posted on 23 May 2025 - 4 months ago
python pandas analytics

Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.


๐Ÿง  Part 8: Exploratory Data Analysis (EDA) with Pandas

Welcome back! In the previous part, we handled missing data effectively.
Now itโ€™s time to dive deeper into our dataset and perform Exploratory Data Analysis (EDA).
EDA helps us uncover hidden patterns, detect anomalies, and test hypotheses using simple Pandas techniques.


๐Ÿ” Dataset Setup

Weโ€™ll use the same dataset as before, but now with missing values handled:

import pandas as pd

df = pd.read_csv("prices_with_missing_data.csv")

# Optionally fill missing values
df['quantity_sold'] = df['quantity_sold'].fillna(0)
df['price'] = df['price'].fillna(df['price'].mean())
df['brand'] = df['brand'].fillna("Unknown")

๐Ÿ“ˆ 1. General Overview

print(df.head())
print(df.info())
print(df.describe())
  • df.info() tells us about data types and non-null counts.
  • df.describe() gives statistical summaries of numeric columns.

๐Ÿ”ข 2. Value Counts & Uniqueness

How many brands and categories do we have?

print(df['brand'].value_counts())
print(df['category'].value_counts())
print(df['product_name'].nunique(), "unique products")

๐Ÿ“Š 3. Grouped Aggregations

Letโ€™s analyze average sales and prices per category:

category_summary = df.groupby("category").agg({
    "price": "mean",
    "quantity_sold": "sum"
}).sort_values("quantity_sold", ascending=False)

print(category_summary)

๐Ÿ“… 4. Time-based Insights

We can parse the date column and check sales over time:

df["date"] = pd.to_datetime(df["date"])

# Daily total quantity sold
daily_sales = df.groupby("date")["quantity_sold"].sum()
print(daily_sales.tail())

# Weekly average sales
weekly_avg = df.resample("W", on="date")["quantity_sold"].mean()
print(weekly_avg.tail())

๐Ÿ“‰ 5. Detecting Outliers

Letโ€™s find unusually high-priced items:

high_prices = df[df["price"] > df["price"].quantile(0.95)]
print(high_prices[["product_name", "price"]])

๐Ÿงช 6. Correlations

Do price and quantity_sold correlate?

correlation = df[["price", "quantity_sold"]].corr()
print(correlation)

Hint: A strong negative correlation might suggest price sensitivity.


๐Ÿงผ 7. Exporting Your Clean EDA Results

You can save your grouped summaries for reporting:

category_summary.to_csv("category_summary.csv")

โœ… Summary

In this part we:

  • Explored structure and statistics of our dataset.
  • Grouped and aggregated data for insights.
  • Identified outliers and trends.
  • Prepared the ground for future visualizations.

Coming next: Part 9 โ€“ Data Visualization with Matplotlib & Pandas!


Stay curious and keep exploring ๐Ÿง ๐Ÿ“Š


๐Ÿ”œ Coming up next


Visualization with Pandas

Visualization with Pandas

In this post, weโ€™ll learn how to create quick and effective visualizations using Pandas (which uses Matplotlib under the hood)

26 May 2025 - 4 months ago