How to use geom_boxplot() in R

ggplot2
ggplot2 geom_boxplot()
Learn how to use geom_boxplot() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The geom_boxplot() function in ggplot2 creates box-and-whisker plots that display the distribution of a continuous variable across different categories. Box plots are perfect for comparing distributions between groups and identifying outliers in your data.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

You want to visualize how penguin body mass varies across different species. A box plot will show you the median, quartiles, and potential outliers for each species.

Step 1: Create a Simple Box Plot

We’ll start with the most basic box plot using penguin data.

ggplot(penguins, aes(x = species, y = body_mass_g)) +
  geom_boxplot()

This creates a box plot showing body mass distribution for each penguin species, with the median line clearly visible in each box.

Step 2: Add Color to Distinguish Groups

Let’s make the plot more visually appealing by adding colors for each species.

ggplot(penguins, aes(x = species, y = body_mass_g, fill = species)) +
  geom_boxplot() +
  scale_fill_viridis_d()

Now each species has a distinct color, making it easier to distinguish between the three penguin species in our dataset.

Step 3: Customize the Appearance

We’ll add proper labels and remove the redundant legend since species are labeled on the x-axis.

ggplot(penguins, aes(x = species, y = body_mass_g, fill = species)) +
  geom_boxplot() +
  scale_fill_viridis_d() +
  labs(title = "Penguin Body Mass by Species",
       x = "Species", y = "Body Mass (g)") +
  theme(legend.position = "none")

Box plot in R using geom_boxplot() in ggplot2 showing penguin body mass by species from the palmerpenguins dataset with viridis fill colors

The plot now has clear labels and a clean appearance without the unnecessary legend.

Example 2: Practical Application

The Problem

You’re analyzing car performance data and want to compare fuel efficiency (mpg) across different numbers of cylinders. You also want to overlay individual data points to see the actual distribution and identify specific outliers.

Step 1: Prepare the Data

First, let’s convert cylinders to a factor for better grouping in our box plot.

mtcars_clean <- mtcars |>
  mutate(cyl_factor = factor(cyl)) |>
  select(mpg, cyl_factor, wt)

This creates a clean dataset with cylinders as a categorical variable, which works better with box plots.

Step 2: Create Box Plot with Data Points

We’ll create a box plot and overlay the actual data points using geom_jitter().

ggplot(mtcars_clean, aes(x = cyl_factor, y = mpg)) +
  geom_boxplot(alpha = 0.7, fill = "lightblue") +
  geom_jitter(width = 0.2, alpha = 0.6, color = "darkred")

The transparent box plots show the summary statistics while the jittered points reveal the actual data distribution and sample sizes.

Step 3: Add Statistical Notches

Let’s add notches to help compare medians between groups statistically.

ggplot(mtcars_clean, aes(x = cyl_factor, y = mpg, fill = cyl_factor)) +
  geom_boxplot(notch = TRUE, alpha = 0.8) +
  geom_jitter(width = 0.2, alpha = 0.6) +
  scale_fill_brewer(palette = "Set2")

Notches provide a visual test for comparing medians - if notches don’t overlap, the medians are significantly different.

Step 4: Final Polish

We’ll add professional labels and formatting for a publication-ready plot.

ggplot(mtcars_clean, aes(x = cyl_factor, y = mpg, fill = cyl_factor)) +
  geom_boxplot(notch = TRUE, alpha = 0.8) +
  geom_jitter(width = 0.2, alpha = 0.6) +
  scale_fill_brewer(palette = "Set2", name = "Cylinders") +
  labs(title = "Fuel Efficiency by Engine Configuration",
       subtitle = "Notches show 95% confidence interval around median",
       x = "Number of Cylinders", y = "Miles per Gallon") +
  theme_minimal()

Notched box plot in R using geom_boxplot() in ggplot2 with jittered points showing mtcars fuel efficiency by number of cylinders

The final plot clearly shows that cars with fewer cylinders tend to have better fuel efficiency, with non-overlapping notches confirming significant differences.

Summary

  • Use geom_boxplot() to compare distributions across categorical groups
  • Add fill aesthetic to distinguish groups with colors
  • Include notch = TRUE to test for significant differences between medians
  • Combine with geom_jitter() to show individual data points alongside summaries
  • Always add clear labels and consider removing redundant legends for cleaner presentation