Handling Missing Data in Pandas

Handling Missing Data in Pandas

Written by: Marlon Colca
Posted on 20 May 2025 - 3 months ago
python pandas analytics

Learn how to detect, analyze, and clean missing data using Pandas.


🧹 Part 7: Handling Missing Data in Pandas

Real-world data is rarely perfect. Missing values are common, and how you handle them can significantly impact your analysis. In this post, we’ll learn how to detect, analyze, and fill or remove missing data using Pandas.


🧐 Detecting Missing Values

Let’s start by loading a modified version of our dataset that contains missing values:

import pandas as pd

df = pd.read_csv("prices_with_sales_with_missing.csv")

To check for missing values in each column:

df.isnull().sum()

To get a quick overview of how much data is missing relative to the whole dataset:

df.isnull().mean()

🔎 Identifying Missing Rows

You can filter the rows that contain any missing values:

df[df.isnull().any(axis=1)]

Or rows with missing values in a specific column, for example price:

df[df["price"].isnull()]

🛠️ Common Fixes for Missing Data

1. Filling missing values

a. Fill with a default value:

df["price"].fillna(0, inplace=True)

b. Fill with the mean or median:

df["price"].fillna(df["price"].mean(), inplace=True)

c. Forward-fill or backward-fill:

df["price"].fillna(method="ffill", inplace=True)  # uses previous value
df["price"].fillna(method="bfill", inplace=True)  # uses next value

2. Dropping missing values

a. Drop rows with any missing value:

df.dropna(inplace=True)

b. Drop rows with missing values only in specific columns:

df.dropna(subset=["quantity_sold"], inplace=True)

🧪 Before and After Comparison

You can compare how many rows were removed or filled:

print("Before:", len(df))
df_clean = df.dropna()
print("After:", len(df_clean))

🧼 Best Practices

  • Never remove data blindly. Investigate why values are missing.
  • If the missing values are:
    • Rare, you can drop them.
    • Systematic (e.g. prices missing on certain dates), you might need deeper analysis.
  • Document how you handled missing data for reproducibility.

✅ Wrap-up

In this part, we covered:

  • How to detect missing data
  • How to fill or drop missing values
  • When to use each method

Handling missing data is one of the first steps toward building reliable data pipelines.


Next up: Part 8 — Filtering and Selecting Data in Pandas

We’ll learn powerful techniques to slice, dice, and filter rows based on multiple conditions. Let’s keep going!


🔜 Coming up next


Exploratory Data Analysis (EDA) with Pandas
23 May 2025 - 3 months ago

Exploratory Data Analysis (EDA) with Pandas

Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.