Question

How can I create box plots with jitter in Matplotlib?

Answer and Explanation

Creating box plots with jitter in Matplotlib enhances the visualization by preventing overplotting and displaying the data distribution more effectively. Here's a step-by-step explanation on how to achieve this:

1. Import the necessary libraries: Begin by importing matplotlib.pyplot for plotting and numpy for data generation if needed.

2. Prepare the data: Your data should typically be organized as separate arrays or lists for each category or group you want to visualize. You can use Numpy to create a simulated dataset to work with:

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data1 = np.random.normal(100, 15, 200)
data2 = np.random.normal(130, 20, 250)
data3 = np.random.normal(80, 10, 180)
data = [data1, data2, data3]

3. Create the box plot: Use plt.boxplot() to generate the box plot for the given datasets. This is standard practice for representing the distribution summary.

4. Implement jitter using a scatter plot: Instead of plotting all individual points directly on the box plot, apply horizontal "jitter". This slightly offsets each point, making it easier to distinguish multiple overlapping points.

Here’s how to accomplish it, create a numerical list with length equal to each category's number of observations that are then plotted on top of the box plots.

plt.boxplot(data) # box plot with data list passed for i, dat in enumerate(data):
  jitter_val = 0.2 np.random.rand(len(dat)) # value to add/subtract
  positions = [ i+1 for i in range(len(dat))]# create the index to iterate on
   plt.scatter( positions, dat, alpha=0.6, s=3 ,color='red') #scatter the observation data.

plt.xticks(range(1, len(data) + 1), ["Category A", "Category B", "Category C"])
plt.xlabel("Categories")
plt.ylabel("Values")
plt.title("Box Plot with Jitter")
plt.grid(True)
plt.show()

5. Set Labels and Title: Add labels for X axis to indicate different boxplots for Categories. Provide a title for the entire figure for additional clarity

6. Show Plot: Make sure the final line shows the boxplots.

Full Code Example:

import matplotlib.pyplot as plt import numpy as np np.random.seed(42) data1 = np.random.normal(100, 15, 200) data2 = np.random.normal(130, 20, 250) data3 = np.random.normal(80, 10, 180) data = [data1, data2, data3] plt.boxplot(data) for i, dat in enumerate(data): jitter_val = 0.2 np.random.rand(len(dat)) positions = [ i+1 for i in range(len(dat))] plt.scatter(positions + jitter_val, dat, alpha=0.6, s=3 ,color='red') plt.xticks(range(1, len(data) + 1), ["Category A", "Category B", "Category C"]) plt.xlabel("Categories") plt.ylabel("Values") plt.title("Box Plot with Jitter") plt.grid(True) plt.show()

By applying this code you'll be able to clearly visualize the individual observations on the created box plots without the fear of overlapping

More questions