Welcome to The Coding College! When dealing with datasets where the target variable depends on multiple factors, Multiple Regression comes into play. It’s a core technique in predictive analytics, offering a way to understand and quantify relationships between variables.
In this guide, you’ll learn what Multiple Regression is, its applications, and how to implement it in Python.
What Is Multiple Regression?
Multiple Regression is a type of regression analysis used to predict the value of a dependent variable (yy) based on two or more independent variables (x1,x2,…,xnx_1, x_2, …, x_n). It extends Simple Linear Regression by incorporating multiple predictors.

Why Use Multiple Regression?
- Improved Accuracy: Models relationships involving multiple factors.
- Variable Insights: Quantifies the influence of each independent variable.
- Widely Applicable: Used in business, healthcare, engineering, and more.
Applications of Multiple Regression
1. Predictive Modeling
- Finance: Predicting stock prices based on market indicators.
- Real Estate: Estimating house prices using size, location, and amenities.
2. Data Analysis
- Identifying key factors affecting customer behavior or sales trends.
3. Research
- Studying relationships between medical conditions and risk factors.
Implementing Multiple Regression in Python
1. Using Scikit-Learn
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
# Sample dataset
data = {
"Size (sq.ft)": [500, 1000, 1500, 2000, 2500],
"Bedrooms": [1, 2, 3, 3, 4],
"Price (in $)": [150000, 250000, 350000, 450000, 550000],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Features and target variable
X = df[["Size (sq.ft)", "Bedrooms"]]
y = df["Price (in $)"]
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the Multiple Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared Score: {r2}")
2. Manual Implementation Using Numpy
Understand the math behind Multiple Regression:
# Sample data
X = np.array([[500, 1], [1000, 2], [1500, 3], [2000, 3], [2500, 4]])
y = np.array([150000, 250000, 350000, 450000, 550000])
# Add a column of ones for the intercept
X = np.hstack((np.ones((X.shape[0], 1)), X))
# Calculate coefficients using the Normal Equation
coefficients = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
print(f"Coefficients: {coefficients}")
Visualizing Multiple Regression
Scatter Plot with 2D Projection
While visualizing multiple variables is challenging, you can use pairwise projections for insight:
import matplotlib.pyplot as plt
# Scatter plot for one feature
plt.scatter(df["Size (sq.ft)"], df["Price (in $)"], color="blue", label="Size vs Price")
plt.title("Size vs Price")
plt.xlabel("Size (sq.ft)")
plt.ylabel("Price (in $)")
plt.legend()
plt.show()
Comparing Linear and Multiple Regression
Aspect | Linear Regression | Multiple Regression |
---|---|---|
Number of Features | One | Two or more |
Complexity | Low | Moderate |
Real-World Use Cases | Limited | Extensive |
Exercises
Exercise 1: Build a Multiple Regression Model
Create a model to predict car prices using features like engine size, horsepower, and weight.
Exercise 2: Feature Impact Analysis
Analyze which independent variable has the most significant impact on the dependent variable.
Exercise 3: Evaluate Model Performance
Use Scikit-Learn to compute the Mean Absolute Error (MAE) for a multiple regression model.
Why Learn Multiple Regression with The Coding College?
At The Coding College, we simplify complex concepts like Multiple Regression through clear explanations and practical examples. Whether you’re a beginner or an advanced learner, our tutorials are crafted to empower you.
Conclusion
Multiple Regression is a powerful tool in Machine Learning, allowing you to model complex relationships involving multiple factors. By mastering it, you can tackle diverse real-world problems with confidence.