Logistic Regression is one of the most fundamental algorithms in machine learning for classification problems. Despite its name, Logistic Regression is not a regression algorithm but a classification technique.
In this tutorial on The Coding College, we’ll explore Logistic Regression, its mathematical foundation, implementation in Python, and its applications.
What Is Logistic Regression?
Logistic Regression is a supervised learning algorithm used to predict categorical outcomes based on input variables. The algorithm works by estimating probabilities using a logistic function (sigmoid function).
Key Characteristics:
- Binary Classification: Commonly used for problems with two classes (e.g., spam vs. not spam).
- Probabilistic Output: Outputs probabilities, which can be converted into class predictions.
Logistic Function (Sigmoid Function):

The sigmoid function maps any real number into a range between 0 and 1, making it ideal for probability prediction.
How Logistic Regression Works
- Input Features: Use input variables (features) to compute a weighted sum (zz).
- Apply Sigmoid Function: Convert zz into a probability using the sigmoid function.
- Classify: Assign a class label based on a threshold (e.g., 0.5).
Logistic Regression vs Linear Regression
Aspect | Logistic Regression | Linear Regression |
---|---|---|
Output | Probability (0-1) | Continuous values |
Type of Problem | Classification | Regression |
Assumption of Linearity | Linear relationship in log-odds | Linear relationship between input and output |
Implementing Logistic Regression in Python
Example: Predicting Diabetes
We’ll use the Pima Indians Diabetes dataset to demonstrate Logistic Regression.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
columns = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigree", "Age", "Outcome"]
data = pd.read_csv(url, header=None, names=columns)
# Split into features and target
X = data.iloc[:, :-1]
y = data["Outcome"]
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train Logistic Regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
Evaluating the Model
- Accuracy Score:
print("Accuracy:", accuracy_score(y_test, y_pred))
- Confusion Matrix:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
- Classification Report:
print("Classification Report:\n", classification_report(y_test, y_pred))
Real-World Applications
- Medical Diagnosis: Predicting diseases based on patient data.
- Spam Detection: Identifying spam emails.
- Credit Scoring: Determining loan eligibility.
- Customer Retention: Predicting whether a customer will churn.
Advantages and Disadvantages
Advantages:
- Simple and Interpretable: Easy to understand and explain.
- Efficient: Works well with small to medium-sized datasets.
- Probabilistic Predictions: Provides confidence scores for predictions.
Disadvantages:
- Linearity Assumption: Assumes a linear relationship between features and log-odds.
- Sensitive to Outliers: Outliers can affect model performance.
- Limited to Binary Classification: Requires extensions for multi-class problems (e.g., one-vs-rest).
Exercises
Exercise 1: Logistic Regression on Titanic Dataset
Use the Titanic dataset to predict survival. Visualize the coefficients to interpret feature importance.
Exercise 2: Multi-Class Classification
Extend Logistic Regression for multi-class classification using the Iris dataset. Evaluate the model’s performance.
Exercise 3: Threshold Adjustment
Experiment with different classification thresholds (e.g., 0.4, 0.6) and observe the impact on precision and recall.
Why Learn at The Coding College?
At The Coding College, we make machine learning concepts like Logistic Regression accessible to everyone. Our hands-on tutorials and exercises are designed to help you apply these algorithms in real-world scenarios.
Conclusion
Logistic Regression is a foundational algorithm in machine learning, offering simplicity and effectiveness for classification tasks. By understanding its workings and applications, you can confidently tackle a variety of binary classification problems.