Matplotlib Histograms

Welcome to The Coding College, where we simplify programming concepts for everyone! In this tutorial, we’ll dive into Matplotlib Histograms—a fundamental tool for visualizing the distribution of numerical data. Histograms are widely used in data analysis to understand frequency distributions, making them essential for data-driven decisions.

What Is a Histogram?

A histogram is a bar chart that groups continuous data into intervals (or bins) and displays the frequency of data points within each bin. Unlike bar charts, histograms are used for quantitative data, not categorical data.

Creating a Basic Histogram

To create a histogram in Matplotlib, use the plt.hist() function.

Example: Basic Histogram

import matplotlib.pyplot as plt  

# Sample data
data = [7, 8, 5, 6, 4, 10, 12, 14, 13, 8, 9, 6, 7, 5, 6, 10, 11, 15]  

# Create histogram
plt.hist(data, bins=5, color="blue", edgecolor="black")  
plt.title("Basic Histogram")  
plt.xlabel("Data Intervals")  
plt.ylabel("Frequency")  
plt.show()  

Output: A histogram showing how the data is distributed across intervals.

Adjusting the Number of Bins

The bins parameter controls the number of intervals.

plt.hist(data, bins=10, color="green", edgecolor="black")  
plt.title("Histogram with 10 Bins")  
plt.show()  

Customizing Histograms

1. Changing Bar Color

Set the color of bars using the color parameter:

plt.hist(data, bins=5, color="orange", edgecolor="black")  
plt.title("Histogram with Custom Bar Color")  
plt.show()  

2. Adding Transparency

Control bar transparency with the alpha parameter (range: 0 to 1):

plt.hist(data, bins=5, color="blue", alpha=0.7, edgecolor="black")  
plt.title("Histogram with Transparency")  
plt.show()  

3. Cumulative Histogram

To display cumulative frequencies, set cumulative=True:

plt.hist(data, bins=5, cumulative=True, color="purple", edgecolor="black")  
plt.title("Cumulative Histogram")  
plt.xlabel("Data Intervals")  
plt.ylabel("Cumulative Frequency")  
plt.show()  

Comparing Two Datasets

You can compare two datasets in a single histogram using the alpha parameter to distinguish overlapping bars.

data1 = [7, 8, 5, 6, 4, 10, 12, 14, 13, 8]  
data2 = [5, 6, 7, 5, 8, 9, 10, 7, 6, 9]  

plt.hist(data1, bins=5, alpha=0.7, label="Dataset 1", color="blue", edgecolor="black")  
plt.hist(data2, bins=5, alpha=0.7, label="Dataset 2", color="green", edgecolor="black")  

plt.title("Comparing Two Datasets")  
plt.legend()  
plt.show()  

Normalizing the Histogram

Normalize the frequencies to represent probabilities by setting density=True:

plt.hist(data, bins=5, density=True, color="teal", edgecolor="black")  
plt.title("Normalized Histogram")  
plt.xlabel("Data Intervals")  
plt.ylabel("Probability Density")  
plt.show()  

Annotating the Histogram

Add labels to highlight specific bins or frequencies:

hist, bins, _ = plt.hist(data, bins=5, color="lightblue", edgecolor="black")  

for i in range(len(hist)):  
    plt.text(bins[i] + (bins[1] - bins[0]) / 2, hist[i] - 0.5, str(int(hist[i])), ha="center")  

plt.title("Annotated Histogram")  
plt.xlabel("Data Intervals")  
plt.ylabel("Frequency")  
plt.show()  

Practice Exercises

Exercise 1: Custom Bins

Create a histogram with custom bins [0, 5, 10, 15, 20]. Add labels and customize colors.

Exercise 2: Overlay Datasets

Compare three datasets in one histogram. Use transparency to distinguish overlapping bars.

Exercise 3: Probability Histogram

Generate a histogram with normalized frequencies and overlay a line plot showing the distribution curve.

Common Issues and Solutions

  1. Bins Are Too Wide or Narrow
    • Cause: Incorrect bins value.
    • Solution: Experiment with different bin sizes to fit your data.
  2. Bars Overlap in Comparison
    • Cause: Omitted alpha parameter.
    • Solution: Use transparency for overlapping datasets.
  3. Frequency Values Misleading
    • Cause: Data scale issues.
    • Solution: Normalize with density=True if comparing probability distributions.

Why Choose The Coding College?

At The Coding College, we focus on practical, user-friendly tutorials. Learning histograms in Matplotlib equips you with the skills to analyze and present data effectively, a key asset in programming and data science.

Conclusion

Histograms are a fundamental tool for understanding data distributions. By mastering their creation and customization, you can make data analysis more insightful and engaging.

Leave a Comment