Machine Learning - Multiple Regression

Welcome to The Coding College! When dealing with datasets where the target variable depends on multiple factors, Multiple Regression comes into play. It’s a core technique in predictive analytics, offering a way to understand and quantify relationships between variables.

In this guide, you’ll learn what Multiple Regression is, its applications, and how to implement it in Python.

What Is Multiple Regression?

Multiple Regression is a type of regression analysis used to predict the value of a dependent variable (yy) based on two or more independent variables (x1,x2,…,xnx_1, x_2, …, x_n). It extends Simple Linear Regression by incorporating multiple predictors.

Why Use Multiple Regression?

Improved Accuracy: Models relationships involving multiple factors.
Variable Insights: Quantifies the influence of each independent variable.
Widely Applicable: Used in business, healthcare, engineering, and more.

Applications of Multiple Regression

1. Predictive Modeling

Finance: Predicting stock prices based on market indicators.
Real Estate: Estimating house prices using size, location, and amenities.

2. Data Analysis

Identifying key factors affecting customer behavior or sales trends.

3. Research

Studying relationships between medical conditions and risk factors.

Implementing Multiple Regression in Python

1. Using Scikit-Learn

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Sample dataset
data = {
    "Size (sq.ft)": [500, 1000, 1500, 2000, 2500],
    "Bedrooms": [1, 2, 3, 3, 4],
    "Price (in $)": [150000, 250000, 350000, 450000, 550000],
}

# Create a DataFrame
df = pd.DataFrame(data)

# Features and target variable
X = df[["Size (sq.ft)", "Bedrooms"]]
y = df["Price (in $)"]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Multiple Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared Score: {r2}")

2. Manual Implementation Using Numpy

Understand the math behind Multiple Regression:

# Sample data
X = np.array([[500, 1], [1000, 2], [1500, 3], [2000, 3], [2500, 4]])
y = np.array([150000, 250000, 350000, 450000, 550000])

# Add a column of ones for the intercept
X = np.hstack((np.ones((X.shape[0], 1)), X))

# Calculate coefficients using the Normal Equation
coefficients = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(f"Coefficients: {coefficients}")

Visualizing Multiple Regression

Scatter Plot with 2D Projection

While visualizing multiple variables is challenging, you can use pairwise projections for insight:

import matplotlib.pyplot as plt

# Scatter plot for one feature
plt.scatter(df["Size (sq.ft)"], df["Price (in $)"], color="blue", label="Size vs Price")
plt.title("Size vs Price")
plt.xlabel("Size (sq.ft)")
plt.ylabel("Price (in $)")
plt.legend()
plt.show()

Comparing Linear and Multiple Regression

Aspect	Linear Regression	Multiple Regression
Number of Features	One	Two or more
Complexity	Low	Moderate
Real-World Use Cases	Limited	Extensive

Exercises

Exercise 1: Build a Multiple Regression Model

Create a model to predict car prices using features like engine size, horsepower, and weight.

Exercise 2: Feature Impact Analysis

Analyze which independent variable has the most significant impact on the dependent variable.

Exercise 3: Evaluate Model Performance

Use Scikit-Learn to compute the Mean Absolute Error (MAE) for a multiple regression model.

Why Learn Multiple Regression with The Coding College?

At The Coding College, we simplify complex concepts like Multiple Regression through clear explanations and practical examples. Whether you’re a beginner or an advanced learner, our tutorials are crafted to empower you.

Conclusion

Multiple Regression is a powerful tool in Machine Learning, allowing you to model complex relationships involving multiple factors. By mastering it, you can tackle diverse real-world problems with confidence.

Machine Learning – Multiple Regression