-->
Written by: Marlon Colca
Posted on 20 May 2025 - 3 months ago
python pandas analytics
Learn how to detect, analyze, and clean missing data using Pandas.
Real-world data is rarely perfect. Missing values are common, and how you handle them can significantly impact your analysis. In this post, we’ll learn how to detect, analyze, and fill or remove missing data using Pandas.
Let’s start by loading a modified version of our dataset that contains missing values:
import pandas as pd
df = pd.read_csv("prices_with_sales_with_missing.csv")
To check for missing values in each column:
df.isnull().sum()
To get a quick overview of how much data is missing relative to the whole dataset:
df.isnull().mean()
You can filter the rows that contain any missing values:
df[df.isnull().any(axis=1)]
Or rows with missing values in a specific column, for example price
:
df[df["price"].isnull()]
df["price"].fillna(0, inplace=True)
df["price"].fillna(df["price"].mean(), inplace=True)
df["price"].fillna(method="ffill", inplace=True) # uses previous value
df["price"].fillna(method="bfill", inplace=True) # uses next value
df.dropna(inplace=True)
df.dropna(subset=["quantity_sold"], inplace=True)
You can compare how many rows were removed or filled:
print("Before:", len(df))
df_clean = df.dropna()
print("After:", len(df_clean))
In this part, we covered:
Handling missing data is one of the first steps toward building reliable data pipelines.
Next up: Part 8 — Filtering and Selecting Data in Pandas
We’ll learn powerful techniques to slice, dice, and filter rows based on multiple conditions. Let’s keep going!
Learn how to explore and understand your dataset using Pandas, identifying trends, patterns, and potential issues in your data.