Pandas DataFrame Reference - The Coding College

Welcome to The Coding College! This reference guide covers all essential functions and methods for working with Pandas DataFrames. Bookmark this page for quick access while coding!

What is a DataFrame?

A DataFrame is a 2-dimensional labeled data structure in Pandas, similar to a spreadsheet or SQL table. It is one of the most widely used tools for data analysis and manipulation in Python.

Creating a DataFrame

From a Dictionary

import pandas as pd

data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}
df = pd.DataFrame(data)

From a List of Lists

data = [["Alice", 25], ["Bob", 30], ["Charlie", 35]]
df = pd.DataFrame(data, columns=["Name", "Age"])

From a CSV File

df = pd.read_csv("file.csv")

Basic Attributes

Attribute	Description	Example
`df.shape`	Returns the dimensions of the DataFrame	`(rows, columns)`
`df.columns`	Returns the column labels	`Index(['col1', 'col2'])`
`df.index`	Returns the row labels (index)	`RangeIndex(start=0, stop=5)`
`df.dtypes`	Returns the data type of each column	`object, int64, float64`
`df.size`	Total number of elements	`rows * columns`

Viewing and Inspecting Data

Function/Method	Description	Example
`df.head(n)`	Displays the first `n` rows (default 5)	`df.head(3)`
`df.tail(n)`	Displays the last `n` rows	`df.tail(2)`
`df.info()`	Summarizes the DataFrame	`df.info()`
`df.describe()`	Summary statistics for numeric columns	`df.describe()`
`df.sample(n)`	Randomly selects `n` rows	`df.sample(3)`
`df.isnull()`	Checks for missing values	`df.isnull()`
`df.notnull()`	Checks for non-missing values	`df.notnull()`

Selecting and Filtering Data

Select Columns

df["column_name"]           # Single column as Series
df[["col1", "col2"]]        # Multiple columns as DataFrame

Select Rows

df.loc[0]                  # Row by label/index
df.iloc[0]                 # Row by position
df.loc[1:3]                # Rows within a range (inclusive)
df.iloc[1:3]               # Rows within a range (exclusive)

Filtering Rows

df[df["Age"] > 30]         # Rows where Age > 30
df[(df["Age"] > 25) & (df["Name"] == "Alice")]  # Multiple conditions

Adding, Updating, and Removing Data

Add Columns

df["New_Column"] = [1, 2, 3]  # Add a new column
df["Updated_Age"] = df["Age"] + 5  # Update a column

Remove Columns

df.drop("column_name", axis=1, inplace=True)  # Drop a single column
df.drop(["col1", "col2"], axis=1, inplace=True)  # Drop multiple columns

Add Rows

new_row = pd.DataFrame({"Name": ["David"], "Age": [40]})
df = pd.concat([df, new_row], ignore_index=True)

Remove Rows

df.drop(0, axis=0, inplace=True)  # Drop row by index
df.drop(df[df["Age"] < 30].index, axis=0, inplace=True)  # Drop rows by condition

Sorting Data

Function/Method	Description	Example
`df.sort_values()`	Sort by column(s)	`df.sort_values("Age")`
`df.sort_index()`	Sort by index	`df.sort_index()`
`ascending` Parameter	Sort in descending order	`df.sort_values("Age", ascending=False)`

Aggregations and Grouping

Aggregation Functions

Function	Description	Example
`df.sum()`	Sum of values	`df["Age"].sum()`
`df.mean()`	Mean of values	`df["Age"].mean()`
`df.median()`	Median of values	`df["Age"].median()`
`df.min()`	Minimum value	`df["Age"].min()`
`df.max()`	Maximum value	`df["Age"].max()`
`df.count()`	Count of non-NA/null values	`df["Age"].count()`

Group Data

df.groupby("column_name").sum()  # Group by column and calculate sum
df.groupby(["col1", "col2"]).mean()  # Multi-column grouping

Merging, Joining, and Concatenation

Function/Method	Description	Example
`pd.concat()`	Concatenate DataFrames	`pd.concat([df1, df2])`
`pd.merge()`	Merge DataFrames on common columns	`pd.merge(df1, df2, on="column_name")`
`df.join()`	Join DataFrames by index	`df1.join(df2, how="inner")`

Handling Missing Data

Function/Method	Description	Example
`df.dropna()`	Remove missing data	`df.dropna()`
`df.fillna()`	Fill missing data	`df.fillna(0)`
`df.interpolate()`	Fill missing data using interpolation	`df.interpolate(method='linear')`

Saving and Exporting Data

File Type	Method	Example
CSV	`df.to_csv()`	`df.to_csv("output.csv", index=False)`
Excel	`df.to_excel()`	`df.to_excel("output.xlsx", index=False)`
JSON	`df.to_json()`	`df.to_json("output.json")`

DataFrame Visualization

Plot Type	Method	Example
Line Plot	`df.plot.line()`	`df["Age"].plot.line()`
Bar Plot	`df.plot.bar()`	`df["Age"].plot.bar()`
Scatter Plot	`df.plot.scatter()`	`df.plot.scatter(x="Age", y="Salary")`

Learn More

For more tutorials, hands-on exercises, and detailed explanations, visit The Coding College. Keep this reference handy to accelerate your Pandas journey! 🚀