Exploratory Data Analysis (EDA) with Pandas

Exploratory Data Analysis (EDA) with Pandas

Written by: Marlon Colca
Posted on 23 May 2025 - 3 months ago
python pandas analytics

Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.


🧠 Part 8: Exploratory Data Analysis (EDA) with Pandas

Welcome back! In the previous part, we handled missing data effectively.
Now it’s time to dive deeper into our dataset and perform Exploratory Data Analysis (EDA).
EDA helps us uncover hidden patterns, detect anomalies, and test hypotheses using simple Pandas techniques.


πŸ” Dataset Setup

We’ll use the same dataset as before, but now with missing values handled:

import pandas as pd

df = pd.read_csv("prices_with_missing_data.csv")

# Optionally fill missing values
df['quantity_sold'] = df['quantity_sold'].fillna(0)
df['price'] = df['price'].fillna(df['price'].mean())
df['brand'] = df['brand'].fillna("Unknown")

πŸ“ˆ 1. General Overview

print(df.head())
print(df.info())
print(df.describe())
  • df.info() tells us about data types and non-null counts.
  • df.describe() gives statistical summaries of numeric columns.

πŸ”’ 2. Value Counts & Uniqueness

How many brands and categories do we have?

print(df['brand'].value_counts())
print(df['category'].value_counts())
print(df['product_name'].nunique(), "unique products")

πŸ“Š 3. Grouped Aggregations

Let’s analyze average sales and prices per category:

category_summary = df.groupby("category").agg({
    "price": "mean",
    "quantity_sold": "sum"
}).sort_values("quantity_sold", ascending=False)

print(category_summary)

πŸ“… 4. Time-based Insights

We can parse the date column and check sales over time:

df["date"] = pd.to_datetime(df["date"])

# Daily total quantity sold
daily_sales = df.groupby("date")["quantity_sold"].sum()
print(daily_sales.tail())

# Weekly average sales
weekly_avg = df.resample("W", on="date")["quantity_sold"].mean()
print(weekly_avg.tail())

πŸ“‰ 5. Detecting Outliers

Let’s find unusually high-priced items:

high_prices = df[df["price"] > df["price"].quantile(0.95)]
print(high_prices[["product_name", "price"]])

πŸ§ͺ 6. Correlations

Do price and quantity_sold correlate?

correlation = df[["price", "quantity_sold"]].corr()
print(correlation)

Hint: A strong negative correlation might suggest price sensitivity.


🧼 7. Exporting Your Clean EDA Results

You can save your grouped summaries for reporting:

category_summary.to_csv("category_summary.csv")

βœ… Summary

In this part we:

  • Explored structure and statistics of our dataset.
  • Grouped and aggregated data for insights.
  • Identified outliers and trends.
  • Prepared the ground for future visualizations.

Coming next: Part 9 – Data Visualization with Matplotlib & Pandas!


Stay curious and keep exploring πŸ§ πŸ“Š