How to ANOVA two-way in R

statistics
ANOVA two-way
Learn how to perform anova two-way in R. Step-by-step statistical tutorial with examples.
Published

February 21, 2026

Introduction

Two-way ANOVA tests whether two categorical variables (factors) have significant effects on a continuous outcome variable, including their interaction. Use this when you want to examine how two different grouping variables simultaneously influence your response variable.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to test if penguin body mass differs by both species and sex, and whether there’s an interaction between these factors.

Step 1: Explore the Data

First, let’s examine our variables and check for missing values.

penguins |>
  select(body_mass_g, species, sex) |>
  summary()

# Remove missing values
penguins_clean <- penguins |>
  filter(!is.na(body_mass_g), !is.na(sex))

We now have a clean dataset with 333 penguins across 3 species and 2 sexes.

Step 2: Visualize the Data

Create a plot to visualize potential differences between groups.

penguins_clean |>
  ggplot(aes(x = species, y = body_mass_g, fill = sex)) +
  geom_boxplot(alpha = 0.85) +
  labs(title = "Penguin Body Mass by Species and Sex",
       x = "Species", y = "Body Mass (g)")

Grouped boxplot of penguin body mass in grams by species and sex used as the visual setup for a two-way ANOVA in R, showing consistent male-versus-female mass differences alongside species-level differences between Adelie, Chinstrap, and Gentoo.

The plot suggests both species and sex affect body mass, with possible interactions.

Step 3: Run Two-Way ANOVA

Perform the ANOVA test including the interaction term.

# Fit the two-way ANOVA model
anova_model <- aov(body_mass_g ~ species * sex, data = penguins_clean)

# View results
summary(anova_model)

The results show significant main effects for both species and sex, plus a significant interaction.

Step 4: Check Model Assumptions

Verify that our data meets ANOVA assumptions.

# Check residuals
par(mfrow = c(2, 2))
plot(anova_model)
par(mfrow = c(1, 1))

The diagnostic plots show our assumptions are reasonably met with normal residuals and equal variances.

Example 2: Practical Application

The Problem

A researcher wants to determine if car fuel efficiency (mpg) is affected by both transmission type (automatic vs manual) and number of cylinders. They also want to know if these factors interact with each other.

Step 1: Prepare the Data

Convert variables to appropriate types and explore the structure.

# Prepare mtcars data
mtcars_clean <- mtcars |>
  mutate(
    transmission = factor(am, labels = c("automatic", "manual")),
    cylinders = factor(cyl)
  )

head(mtcars_clean[c("mpg", "transmission", "cylinders")])

We’ve converted transmission and cylinders to factors for proper ANOVA analysis.

Step 2: Explore Group Means

Calculate descriptive statistics for each combination of factors.

mtcars_clean |>
  group_by(transmission, cylinders) |>
  summarise(
    mean_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    n = n(),
    .groups = "drop"
  )

Manual transmissions generally show higher mpg, and fewer cylinders associate with better fuel efficiency.

Step 3: Visualize the Interaction

Create a plot to examine potential interactions between factors.

mtcars_clean |>
  ggplot(aes(x = cylinders, y = mpg, color = transmission)) +
  geom_point(size = 3, alpha = 0.7,
             position = position_jitter(width = 0.1, height = 0)) +
  stat_summary(fun = mean, geom = "line",
               aes(group = transmission), linewidth = 1.2) +
  labs(title = "MPG by Cylinders and Transmission Type",
       x = "Number of Cylinders", y = "Miles per Gallon")

Interaction plot for a two-way ANOVA in R using mtcars, showing mean miles per gallon across four, six, and eight cylinders connected by lines for automatic and manual transmissions so parallel versus crossing lines reveal whether an interaction between cylinder count and transmission exists.

The plot suggests transmission effects may vary across cylinder groups.

Step 4: Conduct Two-Way ANOVA

Run the complete analysis including interaction effects.

# Fit the model
car_anova <- aov(mpg ~ transmission * cylinders, data = mtcars_clean)

# Display results
summary(car_anova)

Results show significant main effects for cylinders but not transmission, with no significant interaction.

Step 5: Post-Hoc Analysis

Since cylinders has multiple levels, perform pairwise comparisons.

# Tukey's HSD for multiple comparisons
TukeyHSD(car_anova, "cylinders")

The post-hoc test reveals which specific cylinder groups differ significantly from each other.

Summary

  • Two-way ANOVA examines effects of two categorical variables on a continuous outcome
  • Include interaction terms using * to test if factor effects depend on each other
  • Always check model assumptions using diagnostic plots before interpreting results
  • Use post-hoc tests like TukeyHSD for factors with more than two levels
  • Visualize your data first to understand potential patterns and interactions