How to Perform One-Way ANOVA in R

statistics

ANOVA

Complete guide to one-way ANOVA in R using aov(). Learn to compare means across groups with worked examples, post-hoc tests, and result interpretation.

Published

February 21, 2026

Introduction

One-way ANOVA (Analysis of Variance) tests whether the means of three or more groups are significantly different from each other. Use this test when you have one categorical predictor variable and one continuous outcome variable, and you want to compare group means.

When to use one-way ANOVA: - Comparing means across 3+ independent groups - Testing if a categorical factor affects a continuous outcome - Analyzing experimental designs with one treatment factor - Extending the two-sample t-test to multiple groups

The Math Behind ANOVA

ANOVA partitions total variance into between-group and within-group components:

F = (Between-group variance) / (Within-group variance)
   = (SSB / dfB) / (SSW / dfW)
   = MSB / MSW

Where: - SSB (Sum of Squares Between) = Σnᵢ(x̄ᵢ - x̄)² — how much group means differ from overall mean - SSW (Sum of Squares Within) = ΣΣ(xᵢⱼ - x̄ᵢ)² — how much observations vary within groups - dfB = k - 1 (k = number of groups) - dfW = N - k (N = total observations)

A large F-ratio indicates group means differ more than expected from random variation within groups.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to test if penguin body mass differs significantly across the three species (Adelie, Chinstrap, and Gentoo). This is a classic one-way ANOVA scenario with species as our grouping variable.

Step 1: Explore the Data

First, let’s examine our data structure and get basic statistics for each group.

# View the data structure
penguins |>
  select(species, body_mass_g) |>
  glimpse()

This shows us we have a factor variable (species) and a numeric variable (body_mass_g) - perfect for ANOVA.

Step 2: Calculate Group Statistics

Let’s calculate summary statistics to understand differences between groups.

# Get descriptive statistics by species
summary_stats <- penguins |>
  group_by(species) |>
  summarise(
    count = n(),
    mean_mass = mean(body_mass_g, na.rm = TRUE),
    sd_mass = sd(body_mass_g, na.rm = TRUE)
  )
print(summary_stats)

The output shows different sample sizes and mean body masses across species, suggesting potential differences.

Step 3: Visualize Group Differences

Create a boxplot to visually assess differences between groups.

# Create boxplot to visualize differences
penguins |>
  filter(!is.na(body_mass_g)) |>
  ggplot(aes(x = species, y = body_mass_g, fill = species)) +
  geom_boxplot(alpha = 0.8) +
  labs(title = "Penguin Body Mass by Species",
       x = "Species", y = "Body Mass (g)") +
  theme(legend.position = "none")

Boxplot of Adelie, Chinstrap, and Gentoo penguin body mass in grams used as the exploratory visualization before running a one-way ANOVA in R with aov() to test whether species means differ significantly.

The boxplot reveals clear visual differences in body mass distributions across the three species.

Step 4: Perform One-Way ANOVA

Now we’ll conduct the actual ANOVA test using the aov() function.

# Run one-way ANOVA
anova_result <- aov(body_mass_g ~ species, data = penguins)
summary(anova_result)

The ANOVA output shows an F-statistic and p-value to determine if group means differ significantly.

Example 2: Practical Application

The Problem

A car manufacturer wants to determine if fuel efficiency (mpg) differs significantly between cars with different numbers of cylinders. We’ll use the mtcars dataset to analyze whether 4-cylinder, 6-cylinder, and 8-cylinder cars have different average fuel efficiency.

Step 1: Prepare the Data

We need to convert the cylinder variable to a factor and examine our data.

# Prepare data and convert cyl to factor
mtcars_clean <- mtcars |>
  mutate(cyl_factor = as.factor(cyl)) |>
  select(mpg, cyl_factor)

# Check the data
head(mtcars_clean)

Converting cylinders to a factor ensures R treats it as a categorical grouping variable rather than numeric.

Step 2: Check Assumptions

ANOVA assumes normal distribution within groups and equal variances.

# Check normality with Q-Q plots by group
mtcars_clean |>
  ggplot(aes(sample = mpg)) +
  stat_qq() + stat_qq_line() +
  facet_wrap(~cyl_factor) +
  labs(title = "Q-Q Plots by Cylinder Group")

The Q-Q plots help us assess whether the data within each group follows a normal distribution.

Step 3: Perform ANOVA with Post-hoc Tests

Run the ANOVA and follow up with pairwise comparisons.

# Run ANOVA
car_anova <- aov(mpg ~ cyl_factor, data = mtcars_clean)
summary(car_anova)

# Perform Tukey's HSD for pairwise comparisons
TukeyHSD(car_anova)

Tukey’s HSD test tells us which specific pairs of groups differ significantly from each other.

Step 4: Interpret and Visualize Results

Create a comprehensive visualization of the results.

# Create detailed boxplot with mean points
mtcars_clean |>
  ggplot(aes(x = cyl_factor, y = mpg, fill = cyl_factor)) +
  geom_boxplot(alpha = 0.7) +
  stat_summary(fun = mean, geom = "point",
               color = "red", size = 3) +
  labs(title = "Fuel Efficiency by Cylinder Count",
       x = "Number of Cylinders", y = "Miles per Gallon") +
  theme(legend.position = "none")

Boxplot of mtcars miles per gallon grouped by four, six, and eight cylinders with red points overlaid on each group mean, illustrating the between-group mean differences tested by a one-way ANOVA in R.

This visualization combines boxplots with mean points to clearly show both distributions and central tendencies.

Assumptions and Limitations

Key assumptions of one-way ANOVA:

Independence: Observations are independent within and across groups
Normality: Data within each group should be approximately normally distributed
Homogeneity of variance: Variance should be similar across all groups

Test for equal variances (Levene’s test):

library(car)
leveneTest(body_mass_g ~ species, data = penguins)

If p < 0.05, variances are significantly different. Use Welch’s ANOVA instead:

# Welch's ANOVA - doesn't assume equal variances
oneway.test(body_mass_g ~ species, data = penguins, var.equal = FALSE)

Test for normality (Shapiro-Wilk test by group):

penguins |>
  group_by(species) |>
  summarise(
    shapiro_p = shapiro.test(body_mass_g)$p.value
  )

When assumptions are violated:

Non-normality: ANOVA is robust with large samples (n > 30 per group). Otherwise, use Kruskal-Wallis test
Unequal variances: Use Welch’s ANOVA via oneway.test(var.equal = FALSE)
Non-independence: Use repeated measures ANOVA or mixed models

# Non-parametric alternative (Kruskal-Wallis)
kruskal.test(body_mass_g ~ species, data = penguins)

Common Mistakes

1. Not doing post-hoc tests after significant ANOVA

ANOVA tells you groups differ, but not which ones. Always follow up:

# Tukey's HSD for pairwise comparisons
TukeyHSD(anova_result)

# Or pairwise t-tests with correction
pairwise.t.test(penguins$body_mass_g, penguins$species, p.adjust.method = "bonferroni")

2. Using multiple t-tests instead of ANOVA

Running many t-tests inflates Type I error. With 3 groups, you’d do 3 comparisons, making the family-wise error rate ~14% instead of 5%.

3. Ignoring the homogeneity of variance assumption

# Check variance ratios
penguins |>
  group_by(species) |>
  summarise(variance = var(body_mass_g, na.rm = TRUE))

If the largest variance is more than 4× the smallest, consider Welch’s ANOVA.

4. Not reporting effect size

Statistical significance doesn’t indicate practical importance. Report eta-squared (η²):

# Calculate eta-squared
ss_between <- sum(anova_result$residuals^2)
ss_total <- var(penguins$body_mass_g, na.rm = TRUE) * (nrow(na.omit(penguins)) - 1)
eta_squared <- 1 - (ss_between / ss_total)

# Or use effectsize package
library(effectsize)
eta_squared(anova_result)

Small: η² = 0.01
Medium: η² = 0.06
Large: η² = 0.14

Summary

One-way ANOVA compares means across three or more independent groups using the aov() function
Always explore your data first with summary statistics and visualizations before running the test
Check ANOVA assumptions: normality within groups and equal variances across groups
Use TukeyHSD() for post-hoc pairwise comparisons when ANOVA is significant
Report effect size (eta-squared) alongside p-values

--- title: "How to Perform One-Way ANOVA in R" description: "Complete guide to one-way ANOVA in R using aov(). Learn to compare means across groups with worked examples, post-hoc tests, and result interpretation." date: 2026-02-21 categories: ['statistics', 'ANOVA'] format: html: code-fold: false code-tools: true --- ## Introduction One-way ANOVA (Analysis of Variance) tests whether the means of three or more groups are significantly different from each other. Use this test when you have one categorical predictor variable and one continuous outcome variable, and you want to compare group means. **When to use one-way ANOVA:** - Comparing means across 3+ independent groups - Testing if a categorical factor affects a continuous outcome - Analyzing experimental designs with one treatment factor - Extending the two-sample t-test to multiple groups ## The Math Behind ANOVA ANOVA partitions total variance into between-group and within-group components: ``` F = (Between-group variance) / (Within-group variance) = (SSB / dfB) / (SSW / dfW) = MSB / MSW ``` Where: - **SSB** (Sum of Squares Between) = Σnᵢ(x̄ᵢ - x̄)² — how much group means differ from overall mean - **SSW** (Sum of Squares Within) = ΣΣ(xᵢⱼ - x̄ᵢ)² — how much observations vary within groups - **dfB** = k - 1 (k = number of groups) - **dfW** = N - k (N = total observations) A large F-ratio indicates group means differ more than expected from random variation within groups. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem We want to test if penguin body mass differs significantly across the three species (Adelie, Chinstrap, and Gentoo). This is a classic one-way ANOVA scenario with species as our grouping variable. ### Step 1: Explore the Data First, let's examine our data structure and get basic statistics for each group. ```r # View the data structure penguins |> select(species, body_mass_g) |> glimpse() ``` This shows us we have a factor variable (species) and a numeric variable (body_mass_g) - perfect for ANOVA. ### Step 2: Calculate Group Statistics Let's calculate summary statistics to understand differences between groups. ```r # Get descriptive statistics by species summary_stats <- penguins |> group_by(species) |> summarise( count = n(), mean_mass = mean(body_mass_g, na.rm = TRUE), sd_mass = sd(body_mass_g, na.rm = TRUE) ) print(summary_stats) ``` The output shows different sample sizes and mean body masses across species, suggesting potential differences. ### Step 3: Visualize Group Differences Create a boxplot to visually assess differences between groups. ```r # Create boxplot to visualize differences penguins |> filter(!is.na(body_mass_g)) |> ggplot(aes(x = species, y = body_mass_g, fill = species)) + geom_boxplot(alpha = 0.8) + labs(title = "Penguin Body Mass by Species", x = "Species", y = "Body Mass (g)") + theme(legend.position = "none") ``` ![Boxplot of Adelie, Chinstrap, and Gentoo penguin body mass in grams used as the exploratory visualization before running a one-way ANOVA in R with aov() to test whether species means differ significantly.](/images/statistics/one-way-anova-in-r-species-boxplot-ggplot.png) The boxplot reveals clear visual differences in body mass distributions across the three species. ### Step 4: Perform One-Way ANOVA Now we'll conduct the actual ANOVA test using the `aov()` function. ```r # Run one-way ANOVA anova_result <- aov(body_mass_g ~ species, data = penguins) summary(anova_result) ``` The ANOVA output shows an F-statistic and p-value to determine if group means differ significantly. ## Example 2: Practical Application ### The Problem A car manufacturer wants to determine if fuel efficiency (mpg) differs significantly between cars with different numbers of cylinders. We'll use the mtcars dataset to analyze whether 4-cylinder, 6-cylinder, and 8-cylinder cars have different average fuel efficiency. ### Step 1: Prepare the Data We need to convert the cylinder variable to a factor and examine our data. ```r # Prepare data and convert cyl to factor mtcars_clean <- mtcars |> mutate(cyl_factor = as.factor(cyl)) |> select(mpg, cyl_factor) # Check the data head(mtcars_clean) ``` Converting cylinders to a factor ensures R treats it as a categorical grouping variable rather than numeric. ### Step 2: Check Assumptions ANOVA assumes normal distribution within groups and equal variances. ```r # Check normality with Q-Q plots by group mtcars_clean |> ggplot(aes(sample = mpg)) + stat_qq() + stat_qq_line() + facet_wrap(~cyl_factor) + labs(title = "Q-Q Plots by Cylinder Group") ``` The Q-Q plots help us assess whether the data within each group follows a normal distribution. ### Step 3: Perform ANOVA with Post-hoc Tests Run the ANOVA and follow up with pairwise comparisons. ```r # Run ANOVA car_anova <- aov(mpg ~ cyl_factor, data = mtcars_clean) summary(car_anova) # Perform Tukey's HSD for pairwise comparisons TukeyHSD(car_anova) ``` Tukey's HSD test tells us which specific pairs of groups differ significantly from each other. ### Step 4: Interpret and Visualize Results Create a comprehensive visualization of the results. ```r # Create detailed boxplot with mean points mtcars_clean |> ggplot(aes(x = cyl_factor, y = mpg, fill = cyl_factor)) + geom_boxplot(alpha = 0.7) + stat_summary(fun = mean, geom = "point", color = "red", size = 3) + labs(title = "Fuel Efficiency by Cylinder Count", x = "Number of Cylinders", y = "Miles per Gallon") + theme(legend.position = "none") ``` ![Boxplot of mtcars miles per gallon grouped by four, six, and eight cylinders with red points overlaid on each group mean, illustrating the between-group mean differences tested by a one-way ANOVA in R.](/images/statistics/one-way-anova-in-r-mtcars-cylinder-boxplot-ggplot.png) This visualization combines boxplots with mean points to clearly show both distributions and central tendencies. ## Assumptions and Limitations **Key assumptions of one-way ANOVA:** 1. **Independence**: Observations are independent within and across groups 2. **Normality**: Data within each group should be approximately normally distributed 3. **Homogeneity of variance**: Variance should be similar across all groups **Test for equal variances (Levene's test):** ```r library(car) leveneTest(body_mass_g ~ species, data = penguins) ``` If p < 0.05, variances are significantly different. Use Welch's ANOVA instead: ```r # Welch's ANOVA - doesn't assume equal variances oneway.test(body_mass_g ~ species, data = penguins, var.equal = FALSE) ``` **Test for normality (Shapiro-Wilk test by group):** ```r penguins |> group_by(species) |> summarise( shapiro_p = shapiro.test(body_mass_g)$p.value ) ``` **When assumptions are violated:** - **Non-normality**: ANOVA is robust with large samples (n > 30 per group). Otherwise, use Kruskal-Wallis test - **Unequal variances**: Use Welch's ANOVA via `oneway.test(var.equal = FALSE)` - **Non-independence**: Use repeated measures ANOVA or mixed models ```r # Non-parametric alternative (Kruskal-Wallis) kruskal.test(body_mass_g ~ species, data = penguins) ``` ## Common Mistakes **1. Not doing post-hoc tests after significant ANOVA** ANOVA tells you groups differ, but not which ones. Always follow up: ```r # Tukey's HSD for pairwise comparisons TukeyHSD(anova_result) # Or pairwise t-tests with correction pairwise.t.test(penguins$body_mass_g, penguins$species, p.adjust.method = "bonferroni") ``` **2. Using multiple t-tests instead of ANOVA** Running many t-tests inflates Type I error. With 3 groups, you'd do 3 comparisons, making the family-wise error rate ~14% instead of 5%. **3. Ignoring the homogeneity of variance assumption** ```r # Check variance ratios penguins |> group_by(species) |> summarise(variance = var(body_mass_g, na.rm = TRUE)) ``` If the largest variance is more than 4× the smallest, consider Welch's ANOVA. **4. Not reporting effect size** Statistical significance doesn't indicate practical importance. Report eta-squared (η²): ```r # Calculate eta-squared ss_between <- sum(anova_result$residuals^2) ss_total <- var(penguins$body_mass_g, na.rm = TRUE) * (nrow(na.omit(penguins)) - 1) eta_squared <- 1 - (ss_between / ss_total) # Or use effectsize package library(effectsize) eta_squared(anova_result) ``` - Small: η² = 0.01 - Medium: η² = 0.06 - Large: η² = 0.14 ## Summary - One-way ANOVA compares means across three or more independent groups using the `aov()` function - Always explore your data first with summary statistics and visualizations before running the test - Check ANOVA assumptions: normality within groups and equal variances across groups - Use `TukeyHSD()` for post-hoc pairwise comparisons when ANOVA is significant - Report effect size (eta-squared) alongside p-values ## Related Posts - [How to Perform Two-Way ANOVA in R](/statistics/how-to-anova-two-way-in-r) - [How to Perform a Chi-Square Test in R](/statistics/how-to-chi-square-test-in-r) - [How to Perform Logistic Regression in R](/statistics/how-to-logistic-regression-in-r) - [How to Create a Correlation Matrix in R](/statistics/how-to-correlation-matrix-in-r) - [How to Perform a t-test in R](/statistics/how-to-t-test-for-two-samples-in-r)