Machine Learning - Mean, Median, and Mode

Welcome to The Coding College! If you’re diving into Machine Learning (ML), understanding statistical concepts like Mean, Median, and Mode is essential. These measures of central tendency are foundational for data analysis, helping you understand and summarize data effectively.

What Are Mean, Median, and Mode?

These three concepts summarize datasets by representing their central value:

1. Mean

Definition: The average of all values in a dataset.
Formula:

Use Case: Useful when all values are distributed evenly.

2. Median

Definition: The middle value when a dataset is sorted in ascending order.
Use Case: Preferred when data contains outliers, as it is not affected by extreme values.

3. Mode

Definition: The most frequently occurring value in a dataset.
Use Case: Ideal for categorical data, like favorite colors or preferred products.

Why Are These Important in Machine Learning?

Data Preprocessing: Understand the dataset’s distribution before building ML models.
Handling Missing Data: Replace missing values with the mean, median, or mode.
Feature Engineering: Create features based on these metrics to enhance model performance.

Calculating Mean, Median, and Mode in Python

Let’s calculate these values for a sample dataset.

Example Dataset

data = [10, 20, 20, 40, 60]

1. Calculating Mean

mean = sum(data) / len(data)  
print(f"Mean: {mean}")

Output:

Mean: 30.0

2. Calculating Median

data.sort()  # Ensure the data is sorted  
n = len(data)  
median = data[n // 2] if n % 2 != 0 else (data[n // 2 - 1] + data[n // 2]) / 2  
print(f"Median: {median}")

Output:

Median: 20

3. Calculating Mode

Using Python’s statistics module:

from statistics import mode  

mode_value = mode(data)  
print(f"Mode: {mode_value}")

Output:

Mode: 20

Using Libraries for Efficiency

Libraries like NumPy and Pandas simplify these calculations:

import numpy as np  
import pandas as pd  

# Mean
mean = np.mean(data)  

# Median
median = np.median(data)  

# Mode
mode = pd.Series(data).mode()[0]  

print(f"Mean: {mean}, Median: {median}, Mode: {mode}")

Practical Applications

1. Replacing Missing Values

When datasets have missing entries, you can replace them with:

Mean: For numerical data with no significant outliers.
Median: For numerical data with extreme outliers.
Mode: For categorical data.

Example:

import numpy as np  

data_with_nan = [10, np.nan, 20, 30, 40]  
mean_imputed = [np.nanmean(data_with_nan) if np.isnan(x) else x for x in data_with_nan]  
print(f"Data after Mean Imputation: {mean_imputed}")

Limitations

Mean: Sensitive to outliers, which can skew results.
Median: Does not reflect the distribution’s shape.
Mode: May not exist or be unique in some datasets.

Practice Exercises

Exercise 1: Calculate Mean, Median, and Mode

For the dataset [12, 15, 12, 18, 20, 15, 25], calculate the mean, median, and mode using Python.

Exercise 2: Handle Missing Data

Replace missing values in the dataset [5, NaN, 15, NaN, 25] with the mean and median.

Exercise 3: Analyze Categorical Data

For the dataset ['red', 'blue', 'red', 'green', 'red', 'blue'], determine the mode and its frequency.

Why Choose The Coding College?

At The Coding College, we focus on making foundational concepts like Mean, Median, and Mode easy to understand. These basics are crucial for data science and machine learning enthusiasts aiming to master the field.

Conclusion

Understanding Mean, Median, and Mode is a stepping stone to mastering data analysis in Machine Learning. By leveraging these statistical tools, you can gain insights into your data, handle missing values, and improve your model’s accuracy.

Machine Learning – Mean, Median, and Mode

What Are Mean, Median, and Mode?

1. Mean

2. Median

3. Mode

Why Are These Important in Machine Learning?

Calculating Mean, Median, and Mode in Python

Example Dataset

1. Calculating Mean

2. Calculating Median

3. Calculating Mode

Using Libraries for Efficiency

Practical Applications

1. Replacing Missing Values

Example:

Limitations

Practice Exercises

Exercise 1: Calculate Mean, Median, and Mode

Exercise 2: Handle Missing Data

Exercise 3: Analyze Categorical Data

Why Choose The Coding College?

Conclusion

Leave a Comment Cancel reply