Exploratory Data Analysis (EDA) with Pandas

Written by: Marlon Colca
Posted on 23 May 2025 - 4 months ago
python pandas analytics

Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.

Mastering Pandas: From Data Cleaning to Insights

01

What is Pandas and Why Every Data Analyst Loves It
02

Exploring Your Data Like a Pro with Pandas
03

Cleaning Data Without Losing Your Mind (Pandas Edition)
04

Filtering, Selecting and Slicing Your Data Like a Ninja
05

Grouping and Summarizing Data with Pandas (Without Pain)
06

Grouping and Aggregating Data in Pandas
07

Handling Missing Data in Pandas
08

Exploratory Data Analysis (EDA) with Pandas
09

Visualization with Pandas
10

Advanced Pandas tips

Attachments

🧠 Part 8: Exploratory Data Analysis (EDA) with Pandas

Welcome back! In the previous part, we handled missing data effectively.
Now it’s time to dive deeper into our dataset and perform Exploratory Data Analysis (EDA).
EDA helps us uncover hidden patterns, detect anomalies, and test hypotheses using simple Pandas techniques.

🔍 Dataset Setup

We’ll use the same dataset as before, but now with missing values handled:

import pandas as pd

df = pd.read_csv("prices_with_missing_data.csv")

# Optionally fill missing values
df['quantity_sold'] = df['quantity_sold'].fillna(0)
df['price'] = df['price'].fillna(df['price'].mean())
df['brand'] = df['brand'].fillna("Unknown")

📈 1. General Overview

print(df.head())
print(df.info())
print(df.describe())

df.info() tells us about data types and non-null counts.
df.describe() gives statistical summaries of numeric columns.

🔢 2. Value Counts & Uniqueness

How many brands and categories do we have?

print(df['brand'].value_counts())
print(df['category'].value_counts())
print(df['product_name'].nunique(), "unique products")

📊 3. Grouped Aggregations

Let’s analyze average sales and prices per category:

category_summary = df.groupby("category").agg({
    "price": "mean",
    "quantity_sold": "sum"
}).sort_values("quantity_sold", ascending=False)

print(category_summary)

📅 4. Time-based Insights

We can parse the date column and check sales over time:

df["date"] = pd.to_datetime(df["date"])

# Daily total quantity sold
daily_sales = df.groupby("date")["quantity_sold"].sum()
print(daily_sales.tail())

# Weekly average sales
weekly_avg = df.resample("W", on="date")["quantity_sold"].mean()
print(weekly_avg.tail())

📉 5. Detecting Outliers

Let’s find unusually high-priced items:

high_prices = df[df["price"] > df["price"].quantile(0.95)]
print(high_prices[["product_name", "price"]])

🧪 6. Correlations

Do price and quantity_sold correlate?

correlation = df[["price", "quantity_sold"]].corr()
print(correlation)

Hint: A strong negative correlation might suggest price sensitivity.

🧼 7. Exporting Your Clean EDA Results

You can save your grouped summaries for reporting:

category_summary.to_csv("category_summary.csv")

✅ Summary

In this part we:

Explored structure and statistics of our dataset.
Grouped and aggregated data for insights.
Identified outliers and trends.
Prepared the ground for future visualizations.

Coming next: Part 9 – Data Visualization with Matplotlib & Pandas!

Stay curious and keep exploring 🧠📊

🔜 Coming up next

Visualization with Pandas

In this post, we’ll learn how to create quick and effective visualizations using Pandas (which uses Matplotlib under the hood)

26 May 2025 - 4 months ago