Welcome to The Coding College, where we simplify programming concepts for everyone! In this tutorial, we’ll dive into Matplotlib Histograms—a fundamental tool for visualizing the distribution of numerical data. Histograms are widely used in data analysis to understand frequency distributions, making them essential for data-driven decisions.
What Is a Histogram?
A histogram is a bar chart that groups continuous data into intervals (or bins) and displays the frequency of data points within each bin. Unlike bar charts, histograms are used for quantitative data, not categorical data.
Creating a Basic Histogram
To create a histogram in Matplotlib, use the plt.hist()
function.
Example: Basic Histogram
import matplotlib.pyplot as plt
# Sample data
data = [7, 8, 5, 6, 4, 10, 12, 14, 13, 8, 9, 6, 7, 5, 6, 10, 11, 15]
# Create histogram
plt.hist(data, bins=5, color="blue", edgecolor="black")
plt.title("Basic Histogram")
plt.xlabel("Data Intervals")
plt.ylabel("Frequency")
plt.show()
Output: A histogram showing how the data is distributed across intervals.
Adjusting the Number of Bins
The bins
parameter controls the number of intervals.
plt.hist(data, bins=10, color="green", edgecolor="black")
plt.title("Histogram with 10 Bins")
plt.show()
Customizing Histograms
1. Changing Bar Color
Set the color of bars using the color
parameter:
plt.hist(data, bins=5, color="orange", edgecolor="black")
plt.title("Histogram with Custom Bar Color")
plt.show()
2. Adding Transparency
Control bar transparency with the alpha
parameter (range: 0 to 1):
plt.hist(data, bins=5, color="blue", alpha=0.7, edgecolor="black")
plt.title("Histogram with Transparency")
plt.show()
3. Cumulative Histogram
To display cumulative frequencies, set cumulative=True
:
plt.hist(data, bins=5, cumulative=True, color="purple", edgecolor="black")
plt.title("Cumulative Histogram")
plt.xlabel("Data Intervals")
plt.ylabel("Cumulative Frequency")
plt.show()
Comparing Two Datasets
You can compare two datasets in a single histogram using the alpha
parameter to distinguish overlapping bars.
data1 = [7, 8, 5, 6, 4, 10, 12, 14, 13, 8]
data2 = [5, 6, 7, 5, 8, 9, 10, 7, 6, 9]
plt.hist(data1, bins=5, alpha=0.7, label="Dataset 1", color="blue", edgecolor="black")
plt.hist(data2, bins=5, alpha=0.7, label="Dataset 2", color="green", edgecolor="black")
plt.title("Comparing Two Datasets")
plt.legend()
plt.show()
Normalizing the Histogram
Normalize the frequencies to represent probabilities by setting density=True
:
plt.hist(data, bins=5, density=True, color="teal", edgecolor="black")
plt.title("Normalized Histogram")
plt.xlabel("Data Intervals")
plt.ylabel("Probability Density")
plt.show()
Annotating the Histogram
Add labels to highlight specific bins or frequencies:
hist, bins, _ = plt.hist(data, bins=5, color="lightblue", edgecolor="black")
for i in range(len(hist)):
plt.text(bins[i] + (bins[1] - bins[0]) / 2, hist[i] - 0.5, str(int(hist[i])), ha="center")
plt.title("Annotated Histogram")
plt.xlabel("Data Intervals")
plt.ylabel("Frequency")
plt.show()
Practice Exercises
Exercise 1: Custom Bins
Create a histogram with custom bins [0, 5, 10, 15, 20]
. Add labels and customize colors.
Exercise 2: Overlay Datasets
Compare three datasets in one histogram. Use transparency to distinguish overlapping bars.
Exercise 3: Probability Histogram
Generate a histogram with normalized frequencies and overlay a line plot showing the distribution curve.
Common Issues and Solutions
- Bins Are Too Wide or Narrow
- Cause: Incorrect
bins
value. - Solution: Experiment with different bin sizes to fit your data.
- Cause: Incorrect
- Bars Overlap in Comparison
- Cause: Omitted
alpha
parameter. - Solution: Use transparency for overlapping datasets.
- Cause: Omitted
- Frequency Values Misleading
- Cause: Data scale issues.
- Solution: Normalize with
density=True
if comparing probability distributions.
Why Choose The Coding College?
At The Coding College, we focus on practical, user-friendly tutorials. Learning histograms in Matplotlib equips you with the skills to analyze and present data effectively, a key asset in programming and data science.
Conclusion
Histograms are a fundamental tool for understanding data distributions. By mastering their creation and customization, you can make data analysis more insightful and engaging.