Exploring Your Data Like a Pro with Pandas

Written by: Marlon Colca
Posted on 11 May 2025 - 5 months ago
python pandas analytics

Learn how to explore and understand your dataset using Pandas. From `.head()` to `.describe()` and `.value_counts()`, this post walks you through the essential tools.

Mastering Pandas: From Data Cleaning to Insights

01

What is Pandas and Why Every Data Analyst Loves It
02

Exploring Your Data Like a Pro with Pandas
03

Cleaning Data Without Losing Your Mind (Pandas Edition)
04

Filtering, Selecting and Slicing Your Data Like a Ninja
05

Grouping and Summarizing Data with Pandas (Without Pain)
06

Grouping and Aggregating Data in Pandas
07

Handling Missing Data in Pandas
08

Exploratory Data Analysis (EDA) with Pandas
09

Visualization with Pandas
10

Advanced Pandas tips

Attachments

Exploring Your Data Like a Pro with Pandas 🔍

Before you clean, transform, or model your data, you need to understand what you’re working with.
That’s where Exploratory Data Analysis (EDA) comes in — and Pandas makes it easy.

In this post, we’ll look at the basic tools you can use to inspect your dataset and start asking the right questions.

📥 Let’s load a sample dataset

For this tutorial, let’s imagine you’ve loaded a CSV with product prices:

import pandas as pd

df = pd.read_csv("prices_sample.csv")

Let’s now explore it step by step 👇

🧱 Basic structure: `.head()`, `.tail()`, `.shape`, `.info()`

These are your first tools when working with any dataset.

# First 5 rows
print(df.head())

# Last 5 rows
print(df.tail())

# Number of rows and columns
print(df.shape)

# Column types and nulls
print(df.info())

Use this to quickly understand:

What kind of data you’re dealing with
Which columns are numeric or strings
If there are missing values
How many rows you have

📊 Descriptive stats: `.describe()`

This one is a must. It gives you quick stats on all numerical columns:

print(df.describe())

You’ll get:

Count of non-null values
Mean, std dev
Min, max
Percentiles (25%, 50%, 75%)

💡 Great for spotting outliers or weird values (e.g. negative prices?).

📈 Understanding categories: `.value_counts()`

For categorical columns, this method shows how often each value appears.

# Count of products per category
print(df["category"].value_counts())

You can also use it on booleans or binary flags, like availability or on_sale columns.

🕳️ Null values: `.isnull().sum()`

Knowing where your missing data lives is essential.

# Total missing values per column
print(df.isnull().sum())

This helps you decide whether to:

Fill missing values (fillna())
Drop rows/columns (dropna())
Investigate why they’re missing

🧪 Quick checks for unique values

Want to see how many different values a column has?

print(df["brand"].nunique())
print(df["brand"].unique())

Useful to spot typos, inconsistencies, or too many categories.

🚀 Quick summary checklist

Here’s a quick EDA checklist you can use every time you load new data:

✅ df.head() and df.tail()
✅ df.shape and df.info()
✅ df.describe()
✅ df.isnull().sum()
✅ value_counts() on key categorical columns
✅ unique() and nunique() for quick validation

📌 What’s next?

Now that we understand the shape of our data, it’s time to clean it up.
In the next entry, we’ll dive into fixing missing values, renaming columns, fixing data types, and more.

See you in Part 3! 🧼

🔜 Coming up next

Cleaning Data Without Losing Your Mind (Pandas Edition)

Learn how to clean messy data using Pandas. We'll fix missing values, rename columns, convert data types, and prepare our dataset for analysis.

12 May 2025 - 5 months ago