Handling Missing Data in Pandas

Written by: Marlon Colca
Posted on 20 May 2025 - 4 months ago
python pandas analytics

Learn how to detect, analyze, and clean missing data using Pandas.

Mastering Pandas: From Data Cleaning to Insights

01

What is Pandas and Why Every Data Analyst Loves It
02

Exploring Your Data Like a Pro with Pandas
03

Cleaning Data Without Losing Your Mind (Pandas Edition)
04

Filtering, Selecting and Slicing Your Data Like a Ninja
05

Grouping and Summarizing Data with Pandas (Without Pain)
06

Grouping and Aggregating Data in Pandas
07

Handling Missing Data in Pandas
08

Exploratory Data Analysis (EDA) with Pandas
09

Visualization with Pandas
10

Advanced Pandas tips

Attachments

🧹 Part 7: Handling Missing Data in Pandas

Real-world data is rarely perfect. Missing values are common, and how you handle them can significantly impact your analysis. In this post, we’ll learn how to detect, analyze, and fill or remove missing data using Pandas.

🧐 Detecting Missing Values

Let’s start by loading a modified version of our dataset that contains missing values:

import pandas as pd

df = pd.read_csv("prices_with_sales_with_missing.csv")

To check for missing values in each column:

df.isnull().sum()

To get a quick overview of how much data is missing relative to the whole dataset:

df.isnull().mean()

🔎 Identifying Missing Rows

You can filter the rows that contain any missing values:

df[df.isnull().any(axis=1)]

Or rows with missing values in a specific column, for example price:

df[df["price"].isnull()]

🛠️ Common Fixes for Missing Data

1. Filling missing values

a. Fill with a default value:

df["price"].fillna(0, inplace=True)

b. Fill with the mean or median:

df["price"].fillna(df["price"].mean(), inplace=True)

c. Forward-fill or backward-fill:

df["price"].fillna(method="ffill", inplace=True)  # uses previous value
df["price"].fillna(method="bfill", inplace=True)  # uses next value

2. Dropping missing values

a. Drop rows with any missing value:

df.dropna(inplace=True)

b. Drop rows with missing values only in specific columns:

df.dropna(subset=["quantity_sold"], inplace=True)

🧪 Before and After Comparison

You can compare how many rows were removed or filled:

print("Before:", len(df))
df_clean = df.dropna()
print("After:", len(df_clean))

🧼 Best Practices

Never remove data blindly. Investigate why values are missing.
If the missing values are:
- Rare, you can drop them.
- Systematic (e.g. prices missing on certain dates), you might need deeper analysis.
Document how you handled missing data for reproducibility.

✅ Wrap-up

In this part, we covered:

How to detect missing data
How to fill or drop missing values
When to use each method

Handling missing data is one of the first steps toward building reliable data pipelines.

Next up: Part 8 — Filtering and Selecting Data in Pandas

We’ll learn powerful techniques to slice, dice, and filter rows based on multiple conditions. Let’s keep going!

🔜 Coming up next

Exploratory Data Analysis (EDA) with Pandas

Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.

23 May 2025 - 4 months ago