Pandas DataFrame Reference

Welcome to The Coding College! This reference guide covers all essential functions and methods for working with Pandas DataFrames. Bookmark this page for quick access while coding!

What is a DataFrame?

A DataFrame is a 2-dimensional labeled data structure in Pandas, similar to a spreadsheet or SQL table. It is one of the most widely used tools for data analysis and manipulation in Python.

Creating a DataFrame

  1. From a Dictionary
import pandas as pd

data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}
df = pd.DataFrame(data)
  1. From a List of Lists
data = [["Alice", 25], ["Bob", 30], ["Charlie", 35]]
df = pd.DataFrame(data, columns=["Name", "Age"])
  1. From a CSV File
df = pd.read_csv("file.csv")

Basic Attributes

AttributeDescriptionExample
df.shapeReturns the dimensions of the DataFrame(rows, columns)
df.columnsReturns the column labelsIndex(['col1', 'col2'])
df.indexReturns the row labels (index)RangeIndex(start=0, stop=5)
df.dtypesReturns the data type of each columnobject, int64, float64
df.sizeTotal number of elementsrows * columns

Viewing and Inspecting Data

Function/MethodDescriptionExample
df.head(n)Displays the first n rows (default 5)df.head(3)
df.tail(n)Displays the last n rowsdf.tail(2)
df.info()Summarizes the DataFramedf.info()
df.describe()Summary statistics for numeric columnsdf.describe()
df.sample(n)Randomly selects n rowsdf.sample(3)
df.isnull()Checks for missing valuesdf.isnull()
df.notnull()Checks for non-missing valuesdf.notnull()

Selecting and Filtering Data

Select Columns

df["column_name"]           # Single column as Series
df[["col1", "col2"]]        # Multiple columns as DataFrame

Select Rows

df.loc[0]                  # Row by label/index
df.iloc[0]                 # Row by position
df.loc[1:3]                # Rows within a range (inclusive)
df.iloc[1:3]               # Rows within a range (exclusive)

Filtering Rows

df[df["Age"] > 30]         # Rows where Age > 30
df[(df["Age"] > 25) & (df["Name"] == "Alice")]  # Multiple conditions

Adding, Updating, and Removing Data

Add Columns

df["New_Column"] = [1, 2, 3]  # Add a new column
df["Updated_Age"] = df["Age"] + 5  # Update a column

Remove Columns

df.drop("column_name", axis=1, inplace=True)  # Drop a single column
df.drop(["col1", "col2"], axis=1, inplace=True)  # Drop multiple columns

Add Rows

new_row = pd.DataFrame({"Name": ["David"], "Age": [40]})
df = pd.concat([df, new_row], ignore_index=True)

Remove Rows

df.drop(0, axis=0, inplace=True)  # Drop row by index
df.drop(df[df["Age"] < 30].index, axis=0, inplace=True)  # Drop rows by condition

Sorting Data

Function/MethodDescriptionExample
df.sort_values()Sort by column(s)df.sort_values("Age")
df.sort_index()Sort by indexdf.sort_index()
ascending ParameterSort in descending orderdf.sort_values("Age", ascending=False)

Aggregations and Grouping

Aggregation Functions

FunctionDescriptionExample
df.sum()Sum of valuesdf["Age"].sum()
df.mean()Mean of valuesdf["Age"].mean()
df.median()Median of valuesdf["Age"].median()
df.min()Minimum valuedf["Age"].min()
df.max()Maximum valuedf["Age"].max()
df.count()Count of non-NA/null valuesdf["Age"].count()

Group Data

df.groupby("column_name").sum()  # Group by column and calculate sum
df.groupby(["col1", "col2"]).mean()  # Multi-column grouping

Merging, Joining, and Concatenation

Function/MethodDescriptionExample
pd.concat()Concatenate DataFramespd.concat([df1, df2])
pd.merge()Merge DataFrames on common columnspd.merge(df1, df2, on="column_name")
df.join()Join DataFrames by indexdf1.join(df2, how="inner")

Handling Missing Data

Function/MethodDescriptionExample
df.dropna()Remove missing datadf.dropna()
df.fillna()Fill missing datadf.fillna(0)
df.interpolate()Fill missing data using interpolationdf.interpolate(method='linear')

Saving and Exporting Data

File TypeMethodExample
CSVdf.to_csv()df.to_csv("output.csv", index=False)
Exceldf.to_excel()df.to_excel("output.xlsx", index=False)
JSONdf.to_json()df.to_json("output.json")

DataFrame Visualization

Plot TypeMethodExample
Line Plotdf.plot.line()df["Age"].plot.line()
Bar Plotdf.plot.bar()df["Age"].plot.bar()
Scatter Plotdf.plot.scatter()df.plot.scatter(x="Age", y="Salary")

Learn More

For more tutorials, hands-on exercises, and detailed explanations, visit The Coding College. Keep this reference handy to accelerate your Pandas journey! 🚀

Leave a Comment