Welcome to The Coding College! This reference guide covers all essential functions and methods for working with Pandas DataFrames. Bookmark this page for quick access while coding!
What is a DataFrame?
A DataFrame is a 2-dimensional labeled data structure in Pandas, similar to a spreadsheet or SQL table. It is one of the most widely used tools for data analysis and manipulation in Python.
Creating a DataFrame
From a Dictionary
import pandas as pd
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}
df = pd.DataFrame(data)
df["column_name"] # Single column as Series
df[["col1", "col2"]] # Multiple columns as DataFrame
Select Rows
df.loc[0] # Row by label/index
df.iloc[0] # Row by position
df.loc[1:3] # Rows within a range (inclusive)
df.iloc[1:3] # Rows within a range (exclusive)
Filtering Rows
df[df["Age"] > 30] # Rows where Age > 30
df[(df["Age"] > 25) & (df["Name"] == "Alice")] # Multiple conditions
Adding, Updating, and Removing Data
Add Columns
df["New_Column"] = [1, 2, 3] # Add a new column
df["Updated_Age"] = df["Age"] + 5 # Update a column
Remove Columns
df.drop("column_name", axis=1, inplace=True) # Drop a single column
df.drop(["col1", "col2"], axis=1, inplace=True) # Drop multiple columns
df.drop(0, axis=0, inplace=True) # Drop row by index
df.drop(df[df["Age"] < 30].index, axis=0, inplace=True) # Drop rows by condition
Sorting Data
Function/Method
Description
Example
df.sort_values()
Sort by column(s)
df.sort_values("Age")
df.sort_index()
Sort by index
df.sort_index()
ascending Parameter
Sort in descending order
df.sort_values("Age", ascending=False)
Aggregations and Grouping
Aggregation Functions
Function
Description
Example
df.sum()
Sum of values
df["Age"].sum()
df.mean()
Mean of values
df["Age"].mean()
df.median()
Median of values
df["Age"].median()
df.min()
Minimum value
df["Age"].min()
df.max()
Maximum value
df["Age"].max()
df.count()
Count of non-NA/null values
df["Age"].count()
Group Data
df.groupby("column_name").sum() # Group by column and calculate sum
df.groupby(["col1", "col2"]).mean() # Multi-column grouping
Merging, Joining, and Concatenation
Function/Method
Description
Example
pd.concat()
Concatenate DataFrames
pd.concat([df1, df2])
pd.merge()
Merge DataFrames on common columns
pd.merge(df1, df2, on="column_name")
df.join()
Join DataFrames by index
df1.join(df2, how="inner")
Handling Missing Data
Function/Method
Description
Example
df.dropna()
Remove missing data
df.dropna()
df.fillna()
Fill missing data
df.fillna(0)
df.interpolate()
Fill missing data using interpolation
df.interpolate(method='linear')
Saving and Exporting Data
File Type
Method
Example
CSV
df.to_csv()
df.to_csv("output.csv", index=False)
Excel
df.to_excel()
df.to_excel("output.xlsx", index=False)
JSON
df.to_json()
df.to_json("output.json")
DataFrame Visualization
Plot Type
Method
Example
Line Plot
df.plot.line()
df["Age"].plot.line()
Bar Plot
df.plot.bar()
df["Age"].plot.bar()
Scatter Plot
df.plot.scatter()
df.plot.scatter(x="Age", y="Salary")
Learn More
For more tutorials, hands-on exercises, and detailed explanations, visit The Coding College. Keep this reference handy to accelerate your Pandas journey! 🚀