Cleaning Data Without Losing Your Mind (Pandas Edition)

Written by: Marlon Colca
Posted on 12 May 2025 - 5 months ago
python pandas analytics

Learn how to clean messy data using Pandas. We'll fix missing values, rename columns, convert data types, and prepare our dataset for analysis.

Mastering Pandas: From Data Cleaning to Insights

01

What is Pandas and Why Every Data Analyst Loves It
02

Exploring Your Data Like a Pro with Pandas
03

Cleaning Data Without Losing Your Mind (Pandas Edition)
04

Filtering, Selecting and Slicing Your Data Like a Ninja
05

Grouping and Summarizing Data with Pandas (Without Pain)
06

Grouping and Aggregating Data in Pandas
07

Handling Missing Data in Pandas
08

Exploratory Data Analysis (EDA) with Pandas
09

Visualization with Pandas
10

Advanced Pandas tips

Attachments

Cleaning Data Without Losing Your Mind (Pandas Edition) 🧼

Messy data is the rule, not the exception.
Before any serious analysis or machine learning, we need to make sure the data is clean, consistent, and usable.

In this post, we’ll go over how to:

✅ Fix missing values
✅ Rename columns
✅ Convert data types
✅ Clean strings and categories

Let’s get to work.

🧹 Step 1: Detect missing data

We’ll use the same dataset from before (prices_sample.csv).

import pandas as pd

df = pd.read_csv("prices_sample.csv")

# Check for missing values
print(df.isnull().sum())

Even if this dataset is clean, it’s good practice to check. In real-world data, missing prices, categories, or names are super common.

💊 Step 2: Fill or remove missing data

You have a few options here:

Drop rows with missing values:

df_clean = df.dropna()

Fill missing values with a default (e.g. 0, “Unknown”, or a mean):

df["price"] = df["price"].fillna(0)
df["brand"] = df["brand"].fillna("Unknown")

Fill with column average (for numeric columns):

df["price"] = df["price"].fillna(df["price"].mean())

✍️ Step 3: Rename columns

Make column names consistent, lowercase, and readable:

# Rename with a dictionary
df.rename(columns={"product_name": "name", "in_stock": "available"}, inplace=True)

# Or make all lowercase
df.columns = [col.lower() for col in df.columns]

Why? Because you don’t want to type Product Name 100 times, or deal with spaces.

🔢 Step 4: Convert data types

Let’s make sure each column has the right type:

# Convert price to float
df["price"] = df["price"].astype(float)

# Convert in_stock to boolean
df["available"] = df["available"].astype(bool)

For dates (in future entries):

df["date"] = pd.to_datetime(df["date"])

🔤 Step 5: Clean string columns

Sometimes, text columns come with extra whitespace, wrong casing, or typos.

# Strip whitespace and lowercase everything
df["category"] = df["category"].str.strip().str.lower()

# Standardize brand names
df["brand"] = df["brand"].replace({"Laive ": "Laive", "laive": "Laive"})

You can also remove symbols or accents using regular expressions (str.replace()), or normalize text with unidecode if needed.

✅ Final check

Now that we’ve cleaned up, let’s check how things look:

print(df.info())
print(df.head())

🧠 Key Takeaways

Clean data is the foundation of any good analysis
Learn to love .isnull(), .fillna(), and .astype()
Standardizing strings and column names saves a lot of bugs later

📌 What’s next?

Next up in the series: ➡️ Filtering, selecting, and slicing your data like a ninja 🥷

We’ll explore .loc, .iloc, boolean indexing, and more to really dig into the rows and columns that matter.

See you in Part 4!

🔜 Coming up next

Filtering, Selecting and Slicing Your Data Like a Ninja

Master the art of selecting rows and columns using Pandas. Learn `loc`, `iloc`, conditions, and slicing techniques.

13 May 2025 - 5 months ago