Advanced Pandas tips

Written by: Marlon Colca
Posted on 29 May 2025 - 4 months ago
python pandas analytics

Now it’s time to explore advanced Pandas techniques that will make your code more efficient, expressive, and Pythonic.

Mastering Pandas: From Data Cleaning to Insights

01

What is Pandas and Why Every Data Analyst Loves It
02

Exploring Your Data Like a Pro with Pandas
03

Cleaning Data Without Losing Your Mind (Pandas Edition)
04

Filtering, Selecting and Slicing Your Data Like a Ninja
05

Grouping and Summarizing Data with Pandas (Without Pain)
06

Grouping and Aggregating Data in Pandas
07

Handling Missing Data in Pandas
08

Exploratory Data Analysis (EDA) with Pandas
09

Visualization with Pandas
10

Advanced Pandas tips

Attachments

🚀 Part 10: Advanced Pandas Tips and Tricks

Congratulations! You’ve reached the final part of this Pandas series.
So far, we’ve covered everything from loading data to cleaning, grouping, and visualizing it.

Now it’s time to explore advanced Pandas techniques that will make your code more efficient, expressive, and Pythonic.

🧩 1. Using `apply()` for Custom Functions

apply() lets you run custom functions on columns or rows.

import pandas as pd

df = pd.read_csv("prices_with_missing_data.csv")

# Example: calculate discounted price
df["discounted_price"] = df["price"].apply(lambda x: x * 0.9)

You can also apply a function across rows:

def revenue(row):
    return row["price"] * row["quantity_sold"]

df["revenue"] = df.apply(revenue, axis=1)

🔄 2. Using `map()` for Element-wise Operations

map() is simpler than apply() when working with a single Series.

df["brand_upper"] = df["brand"].map(str.upper)

You can also map dictionaries for replacements:

brand_map = {"Brand A": "Premium A", "Brand B": "Budget B"}
df["brand"] = df["brand"].map(brand_map).fillna(df["brand"])

🔎 3. Using `query()` for Cleaner Filtering

Instead of complex boolean indexing, you can use SQL-like queries.

# Products in stock and cheaper than 5
cheap_stock = df.query("in_stock == True and price < 5")

This makes your filters more readable.

⚡ 4. Vectorization vs Loops

Avoid using Python for loops with Pandas. Vectorized operations are much faster.

# Bad (slow)
df["revenue_loop"] = [p*q for p,q in zip(df["price"], df["quantity_sold"])]

# Good (fast, vectorized)
df["revenue_vec"] = df["price"] * df["quantity_sold"]

🧠 5. Chaining Methods for Cleaner Code

Instead of creating temporary variables, you can chain methods:

summary = (
    df.dropna()
      .query("price > 2")
      .groupby("brand")["revenue"]
      .sum()
      .sort_values(ascending=False)
)

This style (called method chaining) is concise and easier to read.

📌 Extra Tips

Use .astype("category") for categorical columns → saves memory.
Use .copy() when creating new DataFrames to avoid warnings.
Learn .merge() and .join() to combine datasets efficiently.
Explore polars (a Pandas alternative) for massive datasets.

✅ Wrap-up

In this final part, you learned how to:

Use apply() and map() for flexible transformations
Filter data more clearly with query()
Speed up your code with vectorization
Write cleaner code with method chaining

🎉 Congratulations — you’ve completed the 10-part Pandas series!
You now have a solid foundation to work confidently with data in Python.

Keep practicing, explore more datasets, and share your insights with others.
Data is powerful when you know how to use it. 🚀