Written by: Marlon Colca
Posted on 23 May 2025 - 4 months ago
python pandas analytics
Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.
Welcome back! In the previous part, we handled missing data effectively.
Now itโs time to dive deeper into our dataset and perform Exploratory Data Analysis (EDA).
EDA helps us uncover hidden patterns, detect anomalies, and test hypotheses using simple Pandas techniques.
Weโll use the same dataset as before, but now with missing values handled:
import pandas as pd
df = pd.read_csv("prices_with_missing_data.csv")
# Optionally fill missing values
df['quantity_sold'] = df['quantity_sold'].fillna(0)
df['price'] = df['price'].fillna(df['price'].mean())
df['brand'] = df['brand'].fillna("Unknown")
print(df.head())
print(df.info())
print(df.describe())
df.info()
tells us about data types and non-null counts.df.describe()
gives statistical summaries of numeric columns.How many brands and categories do we have?
print(df['brand'].value_counts())
print(df['category'].value_counts())
print(df['product_name'].nunique(), "unique products")
Letโs analyze average sales and prices per category:
category_summary = df.groupby("category").agg({
"price": "mean",
"quantity_sold": "sum"
}).sort_values("quantity_sold", ascending=False)
print(category_summary)
We can parse the date column and check sales over time:
df["date"] = pd.to_datetime(df["date"])
# Daily total quantity sold
daily_sales = df.groupby("date")["quantity_sold"].sum()
print(daily_sales.tail())
# Weekly average sales
weekly_avg = df.resample("W", on="date")["quantity_sold"].mean()
print(weekly_avg.tail())
Letโs find unusually high-priced items:
high_prices = df[df["price"] > df["price"].quantile(0.95)]
print(high_prices[["product_name", "price"]])
Do price and quantity_sold correlate?
correlation = df[["price", "quantity_sold"]].corr()
print(correlation)
Hint: A strong negative correlation might suggest price sensitivity.
You can save your grouped summaries for reporting:
category_summary.to_csv("category_summary.csv")
In this part we:
Coming next: Part 9 โ Data Visualization with Matplotlib & Pandas!
Stay curious and keep exploring ๐ง ๐
In this post, weโll learn how to create quick and effective visualizations using Pandas (which uses Matplotlib under the hood)
26 May 2025 - 4 months ago