-->
Written by: Marlon Colca
Posted on 23 May 2025 - 3 months ago
python pandas analytics
Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.
Welcome back! In the previous part, we handled missing data effectively.
Now itβs time to dive deeper into our dataset and perform Exploratory Data Analysis (EDA).
EDA helps us uncover hidden patterns, detect anomalies, and test hypotheses using simple Pandas techniques.
Weβll use the same dataset as before, but now with missing values handled:
import pandas as pd
df = pd.read_csv("prices_with_missing_data.csv")
# Optionally fill missing values
df['quantity_sold'] = df['quantity_sold'].fillna(0)
df['price'] = df['price'].fillna(df['price'].mean())
df['brand'] = df['brand'].fillna("Unknown")
print(df.head())
print(df.info())
print(df.describe())
df.info()
tells us about data types and non-null counts.df.describe()
gives statistical summaries of numeric columns.How many brands and categories do we have?
print(df['brand'].value_counts())
print(df['category'].value_counts())
print(df['product_name'].nunique(), "unique products")
Letβs analyze average sales and prices per category:
category_summary = df.groupby("category").agg({
"price": "mean",
"quantity_sold": "sum"
}).sort_values("quantity_sold", ascending=False)
print(category_summary)
We can parse the date column and check sales over time:
df["date"] = pd.to_datetime(df["date"])
# Daily total quantity sold
daily_sales = df.groupby("date")["quantity_sold"].sum()
print(daily_sales.tail())
# Weekly average sales
weekly_avg = df.resample("W", on="date")["quantity_sold"].mean()
print(weekly_avg.tail())
Letβs find unusually high-priced items:
high_prices = df[df["price"] > df["price"].quantile(0.95)]
print(high_prices[["product_name", "price"]])
Do price and quantity_sold correlate?
correlation = df[["price", "quantity_sold"]].corr()
print(correlation)
Hint: A strong negative correlation might suggest price sensitivity.
You can save your grouped summaries for reporting:
category_summary.to_csv("category_summary.csv")
In this part we:
Coming next: Part 9 β Data Visualization with Matplotlib & Pandas!
Stay curious and keep exploring π§ π