Introduction to Mixed Effects Models in R

statistics
mixed effects models
Learn the basics of mixed effects models in R using the sleepstudy dataset. Understand when and why to use them for longitudinal data, with step-by-step examples using lme4.
Published

February 22, 2026

Introduction

Mixed effects models (also called multilevel models or hierarchical models) are essential when your data has repeated measurements from the same subjects. For example:

  • Reaction times measured daily during a sleep study
  • Patient health scores tracked over multiple visits
  • Student test scores across semesters
  • Plant growth measured weekly

Regular regression assumes all observations are independent. But when you measure the same person multiple times, those measurements are correlated - they share that person’s baseline characteristics. Ignoring this leads to incorrect conclusions.

The Sleep Study Dataset

The sleepstudy dataset from the lme4 package is a classic example of longitudinal data. Researchers measured reaction times of 18 subjects over 10 days of sleep deprivation. The data comes from a study by Gregory Belenky and colleagues on the effects of sleep deprivation on cognitive performance.

# Load required packages
library(tidyverse)   # https://www.tidyverse.org/
library(lme4)        # https://cran.r-project.org/package=lme4
library(broom.mixed) # https://cran.r-project.org/package=broom.mixed

# Look at the data
head(sleepstudy, 12)
   Reaction Days Subject
1  249.5600    0     308
2  258.7047    1     308
3  250.8006    2     308
4  321.4398    3     308
5  356.8519    4     308
6  414.6901    5     308
7  382.2038    6     308
8  290.1486    7     308
9  430.5853    8     308
10 466.3535    9     308
11 222.7339    0     309
12 205.2658    1     309

The data structure:

  • Reaction: Response time in milliseconds (outcome variable)
  • Days: Days of sleep deprivation (0-9)
  • Subject: Subject identifier (18 subjects)
# Check the structure
sleepstudy |>
  group_by(Subject) |>
  summarise(
    n_measurements = n(),
    first_day = min(Days),
    last_day = max(Days)
  ) |>
  head()
# A tibble: 6 × 4
  Subject n_measurements first_day last_day
  <fct>            <int>     <dbl>    <dbl>
1 308                 10         0        9
2 309                 10         0        9
3 310                 10         0        9
4 330                 10         0        9
5 331                 10         0        9
6 332                 10         0        9

Each subject has 10 measurements - one per day. This is longitudinal data: the same people measured repeatedly over time.

Dataset Summary

Variable Type Description Values
Reaction Numeric (outcome) Response time in milliseconds 194 - 466 ms
Days Numeric (predictor) Days of sleep deprivation 0, 1, 2, … 9
Subject Factor (grouping) Unique subject identifier 18 subjects

Repeated Measures Structure

Level Count Description
Subjects 18 Independent individuals in the study
Measurements per subject 10 One measurement per day (Days 0-9)
Total observations 180 18 subjects × 10 days
ImportantKey Insight: Rows ≠ Independent Observations

We have 180 rows but only 18 independent units. Regular regression would treat this as 180 independent observations - that’s wrong! The 10 measurements from Subject 308 are more similar to each other than to measurements from Subject 309.

Why Mixed Effects Models?

The Problem: Individual Differences

Let’s visualize how reaction time changes with sleep deprivation:

# Plot individual trajectories
ggplot(sleepstudy, aes(x = Days, y = Reaction)) +
  geom_line(aes(group = Subject), alpha = 0.3) +
  geom_point(aes(color = Subject), alpha = 0.6) +
  geom_smooth(method = "lm", color = "black", linewidth = 1.5, se = FALSE) +
  labs(
    title = "Reaction Time Over Days of Sleep Deprivation",
    subtitle = "Each gray line is one subject; black line is overall average",
    x = "Days of Sleep Deprivation",
    y = "Reaction Time (ms)"
  ) +
  theme_minimal() +
  theme(legend.position = "none")
Scatter plot showing reaction time versus days of sleep deprivation for 18 subjects. Each subject's data points are connected by lines, showing individual trajectories. A black regression line shows the overall trend. Individual patterns vary widely - some subjects show steep increases in reaction time while others remain relatively flat.
Figure 1

This plot reveals two important things:

  1. Overall trend: Reaction times increase with more sleep deprivation (people get slower)
  2. Individual differences: People start at different baselines AND deteriorate at different rates

Some subjects start slow and get much worse. Others start fast and barely change. A single regression line misses all this individual variation.

Regular Regression: The Wrong Approach

# Fit regular linear regression (WRONG for repeated measures)
model_wrong <- lm(Reaction ~ Days, data = sleepstudy)
summary(model_wrong)

Call:
lm(formula = Reaction ~ Days, data = sleepstudy)

Residuals:
     Min       1Q   Median       3Q      Max 
-110.848  -27.483    1.546   26.142  139.953 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  251.405      6.610  38.033  < 2e-16 ***
Days          10.467      1.238   8.454 9.89e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 47.71 on 178 degrees of freedom
Multiple R-squared:  0.2865,    Adjusted R-squared:  0.2825 
F-statistic: 71.46 on 1 and 178 DF,  p-value: 9.894e-15

This model treats all 180 observations as independent. But they’re not - they come from only 18 people! The standard errors are wrong because we’re pretending we have 180 independent data points when we really have 18 subjects measured 10 times each.

WarningDon’t Do This!

Using lm() on repeated measures data inflates your sample size artificially, leading to p-values that are too small and confidence intervals that are too narrow. Your “significant” findings may be false positives.

Example 1: Random Intercepts Model

The Idea

A random intercepts model allows each subject to have their own baseline reaction time, while estimating a common slope for how everyone deteriorates with sleep deprivation.

Think of it as: “Everyone slows down by the same amount per day, but some people start faster than others.”

TipStart Simple

Always begin with random intercepts only. Add random slopes later if needed. This approach is more likely to converge and easier to interpret.

Fitting the Model

# Fit mixed effects model with random intercept per subject
model_intercepts <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
summary(model_intercepts)
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (1 | Subject)
   Data: sleepstudy

REML criterion at convergence: 1786.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2257 -0.5529  0.0109  0.5188  4.2506 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 1378.2   37.12   
 Residual              960.5   30.99   
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept) 251.4051     9.7467   25.79
Days         10.4673     0.8042   13.02

Correlation of Fixed Effects:
     (Intr)
Days -0.371

Breaking down the formula Reaction ~ Days + (1 | Subject):

  • Reaction ~ Days: Fixed effect - days of sleep deprivation predicts reaction time
  • (1 | Subject): Random effect - allow the intercept to vary by subject

Understanding the Output

Fixed Effects:

fixef(model_intercepts)
(Intercept)        Days 
  251.40510    10.46729 
  • Intercept (251.4): Average reaction time at Day 0 across all subjects
  • Days (10.5): For each additional day, reaction time increases by ~10.5 ms on average

Random Effects:

# See each subject's deviation from the average intercept
ranef(model_intercepts)$Subject |> head()
    (Intercept)
308   40.783710
309  -77.849554
310  -63.108567
330    4.406442
331   10.216189
332    8.221238

Positive values = subject starts slower than average. Negative values = subject starts faster than average.

Visualizing Random Intercepts

# Get predictions
sleepstudy_pred <- sleepstudy |>
  mutate(
    pred_wrong = predict(model_wrong),
    pred_intercepts = predict(model_intercepts)
  )

# Plot random intercepts model
ggplot(sleepstudy_pred, aes(x = Days, y = Reaction)) +
  geom_point(alpha = 0.4) +
  geom_line(aes(y = pred_intercepts, group = Subject, color = Subject)) +
  labs(
    title = "Random Intercepts Model",
    subtitle = "Same slope for everyone, different starting points",
    x = "Days of Sleep Deprivation",
    y = "Reaction Time (ms)"
  ) +
  theme_minimal() +
  theme(legend.position = "none")
Plot showing random intercepts model predictions. Multiple parallel lines with the same slope but different y-intercepts represent each subject. All lines increase at the same rate, but start at different baseline reaction times.
Figure 2

Each subject gets their own intercept (starting point), but all lines are parallel - they share the same slope.

Example 2: Random Slopes Model

The Problem with Random Intercepts Only

Look back at the individual trajectories plot. Some subjects have steep slopes (deteriorate quickly), others have flat slopes (barely affected by sleep deprivation). The random intercepts model forces everyone to have the same slope - that’s too restrictive.

Fitting Random Slopes

# Fit model with random intercepts AND random slopes
model_slopes <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
summary(model_slopes)
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (1 + Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1743.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9536 -0.4634  0.0231  0.4634  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.10   24.741       
          Days         35.07    5.922   0.07
 Residual             654.94   25.592       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.825  36.838
Days          10.467      1.546   6.771

Correlation of Fixed Effects:
     (Intr)
Days -0.138

The formula (1 + Days | Subject) means:

  • 1: Allow intercepts to vary by subject
  • Days: Allow the slope of Days to vary by subject too

Understanding Random Slopes Output

Fixed Effects (the “average” person):

fixef(model_slopes)
(Intercept)        Days 
  251.40510    10.46729 

Random Effects (individual deviations):

# Each subject's intercept AND slope
coef(model_slopes)$Subject |> head()
    (Intercept)      Days
308    253.6637 19.666262
309    211.0064  1.847605
310    212.4447  5.018429
330    275.0957  5.652936
331    273.6654  7.397374
332    260.4447 10.195109

Now each subject has their own intercept AND their own slope. Subject 308 has a slope of ~19.7 (deteriorates quickly), while Subject 309 has a slope of ~1.8 (barely affected).

Visualizing Random Slopes

# Get predictions from random slopes model
sleepstudy_pred <- sleepstudy_pred |>
  mutate(pred_slopes = predict(model_slopes))

# Plot random slopes model
ggplot(sleepstudy_pred, aes(x = Days, y = Reaction)) +
  geom_point(alpha = 0.4) +
  geom_line(aes(y = pred_slopes, group = Subject, color = Subject)) +
  labs(
    title = "Random Slopes Model",
    subtitle = "Each subject has their own intercept AND slope",
    x = "Days of Sleep Deprivation",
    y = "Reaction Time (ms)"
  ) +
  theme_minimal() +
  theme(legend.position = "none")
Plot showing random slopes model predictions. Multiple lines with different slopes and intercepts represent each subject. Lines are no longer parallel - some subjects show steep increases in reaction time over days while others show minimal change.
Figure 3

Now the lines are no longer parallel - each subject has their own trajectory. This better captures the reality of individual differences in sleep deprivation effects.

Example 3: Model Comparison

We’ve now fit three different models to the same data:

  1. Regular regression - ignores that subjects are measured repeatedly
  2. Random intercepts - allows different starting points, same slope for all
  3. Random slopes - allows different starting points AND different slopes

But which model should we actually use? This is a critical question because:

  • Too simple (regular regression): Violates assumptions, gives wrong standard errors
  • Too complex (random slopes when not needed): Wastes degrees of freedom, may not converge
  • Just right: Captures the true data structure without over-fitting

We’ll use three approaches to decide:

Method Question It Answers
AIC Which model balances fit and complexity best?
Likelihood Ratio Test Is the more complex model significantly better?
Residual Analysis Does the model capture the data structure?

Which Model is Best?

Let’s compare all three approaches:

# Compare models using AIC (lower is better)
AIC(model_wrong, model_intercepts, model_slopes)
                 df      AIC
model_wrong       3 1906.293
model_intercepts  4 1794.465
model_slopes      6 1755.628

The random slopes model has the lowest AIC, indicating it’s the best fit for this data.

Likelihood Ratio Test

# Formal test: does adding random slopes improve the model?
anova(model_intercepts, model_slopes)
Data: sleepstudy
Models:
model_intercepts: Reaction ~ Days + (1 | Subject)
model_slopes: Reaction ~ Days + (1 + Days | Subject)
                 npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
model_intercepts    4 1802.1 1814.8 -897.04   1794.1                         
model_slopes        6 1763.9 1783.1 -875.97   1751.9 42.139  2  7.072e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The significant p-value confirms that allowing slopes to vary by subject significantly improves the model.

NoteInterpreting the Likelihood Ratio Test

The test compares nested models (simpler vs more complex). A significant p-value means the extra complexity is justified - the random slopes capture real variation that random intercepts alone miss.

Comparing Residuals

# Add residuals
sleepstudy_pred <- sleepstudy_pred |>
  mutate(
    resid_wrong = residuals(model_wrong),
    resid_slopes = residuals(model_slopes)
  )

# Plot residuals by subject for wrong model
ggplot(sleepstudy_pred, aes(x = Subject, y = resid_wrong)) +
  geom_boxplot() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(
    title = "Regular Regression Residuals by Subject",
    subtitle = "Systematic patterns indicate model is missing subject-level effects",
    y = "Residual"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Box plots showing residuals from regular regression grouped by subject. Several subjects show systematic bias - their residuals are consistently above or below zero, indicating the model is missing subject-level patterns.
Figure 4

The regular regression shows systematic patterns - some subjects are consistently over or under-predicted. The model is missing the individual differences.

# Plot residuals for mixed model
ggplot(sleepstudy_pred, aes(x = Subject, y = resid_slopes)) +
  geom_boxplot() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(
    title = "Mixed Effects Model Residuals by Subject",
    subtitle = "Residuals centered around zero for all subjects",
    y = "Residual"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Box plots showing residuals from the mixed effects model grouped by subject. All subjects' residuals are now centered around zero, indicating the model properly accounts for individual differences.
Figure 5

The mixed effects model residuals are centered around zero for each subject - we’ve properly accounted for individual differences.

Example 4: Making Predictions

Subject-Specific Predictions

One advantage of mixed models is making predictions for specific subjects:

# Predict for a specific subject on a new day
new_data <- data.frame(Days = c(10, 11, 12), Subject = "308")
predict(model_slopes, newdata = new_data)
       1        2        3 
450.3263 469.9925 489.6588 

Population-Level Predictions

For a “typical” new subject, use re.form = NA to ignore random effects:

# Predict for an average/new subject
new_data_pop <- data.frame(Days = c(10, 11, 12))
predict(model_slopes, newdata = new_data_pop, re.form = NA)
       1        2        3 
356.0780 366.5453 377.0125 

Tidying Results with broom.mixed

The broom.mixed package makes it easy to extract model results in tidy data frames - perfect for reporting or further analysis.

Tidy Fixed Effects

# Get fixed effects as a tidy data frame
tidy(model_slopes, effects = "fixed")
# A tibble: 2 × 5
  effect term        estimate std.error statistic
  <chr>  <chr>          <dbl>     <dbl>     <dbl>
1 fixed  (Intercept)    251.       6.82     36.8 
2 fixed  Days            10.5      1.55      6.77

This gives you estimates, standard errors, and confidence intervals in a format ready for tables or plotting.

Tidy Random Effects

# Get random effects (subject-level deviations)
tidy(model_slopes, effects = "ran_vals") |> head(10)
# A tibble: 10 × 6
   effect   group   level term        estimate std.error
   <chr>    <chr>   <chr> <chr>          <dbl>     <dbl>
 1 ran_vals Subject 308   (Intercept)    2.26       12.1
 2 ran_vals Subject 309   (Intercept)  -40.4        12.1
 3 ran_vals Subject 310   (Intercept)  -39.0        12.1
 4 ran_vals Subject 330   (Intercept)   23.7        12.1
 5 ran_vals Subject 331   (Intercept)   22.3        12.1
 6 ran_vals Subject 332   (Intercept)    9.04       12.1
 7 ran_vals Subject 333   (Intercept)   16.8        12.1
 8 ran_vals Subject 334   (Intercept)   -7.23       12.1
 9 ran_vals Subject 335   (Intercept)   -0.334      12.1
10 ran_vals Subject 337   (Intercept)   34.9        12.1

Model Summary Statistics

# Get model-level statistics (AIC, BIC, etc.)
glance(model_slopes)
# A tibble: 1 × 7
   nobs sigma logLik   AIC   BIC REMLcrit df.residual
  <int> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int>
1   180  25.6  -872. 1756. 1775.    1744.         174

Augmented Data with Predictions

# Add predictions and residuals to your data
augment(model_slopes) |> head()
# A tibble: 6 × 14
  Reaction  Days Subject .fitted .resid   .hat .cooksd .fixed   .mu .offset
     <dbl> <dbl> <fct>     <dbl>  <dbl>  <dbl>   <dbl>  <dbl> <dbl>   <dbl>
1     250.     0 308        254.  -4.10 0.229  0.00496   251.  254.       0
2     259.     1 308        273. -14.6  0.170  0.0402    262.  273.       0
3     251.     2 308        293. -42.2  0.127  0.226     272.  293.       0
4     321.     3 308        313.   8.78 0.101  0.00731   283.  313.       0
5     357.     4 308        332.  24.5  0.0910 0.0506    293.  332.       0
6     415.     5 308        352.  62.7  0.0981 0.362     304.  352.       0
# ℹ 4 more variables: .sqrtXwt <dbl>, .sqrtrwt <dbl>, .weights <dbl>,
#   .wtres <dbl>

The .fitted column contains predictions, and .resid contains residuals - useful for diagnostics or visualization.

TipWhy Use broom.mixed?

Base R’s summary() prints results to the console. broom.mixed returns data frames you can filter, join, or pass to ggplot2. This is especially useful when comparing multiple models or creating publication-ready tables.

When to Use Mixed Effects Models

Situation Example Random Effect
Longitudinal data Same people measured over time Subject
Repeated measures Multiple trials per person Subject
Clustered data Students within schools School
Nested designs Patients within hospitals within regions Hospital, Region
Crossed designs Subjects rating multiple items Subject, Item

Common Mistakes to Avoid

1. Treating repeated measures as independent (see linear regression basics for when simple regression IS appropriate)

# WRONG: Ignores that measurements come from the same subjects
lm(Reaction ~ Days, data = sleepstudy)

# RIGHT: Accounts for within-subject correlation
lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)

2. Starting with too complex a model

Start with random intercepts only. Add random slopes if: - Theory suggests the effect varies across subjects - You have enough data (5+ observations per subject) - The simpler model shows poor fit

3. Ignoring convergence warnings

If you see warnings, simplify your model:

# Too complex - may not converge
lmer(y ~ x1 + x2 + x3 + (1 + x1 + x2 + x3 | Subject), data = df)

# Simpler - more likely to converge
lmer(y ~ x1 + x2 + x3 + (1 + x1 | Subject), data = df)

Quick Reference: Formula Syntax

Notelme4 Formula Cheatsheet
Formula Meaning Use When
y ~ x + (1 | group) Random intercepts only Groups differ in baseline
y ~ x + (1 + x | group) Random intercepts + slopes Groups differ in baseline AND effect of x
y ~ x + (0 + x | group) Random slopes only (no intercept) Rare; baseline is fixed but slopes vary
y ~ x + (1 | group1) + (1 | group2) Crossed random effects Two grouping factors (e.g., subjects AND items)
y ~ x + (1 | group1/group2) Nested random effects group2 is nested within group1
y ~ x + (1 | group) + (0 + x | group) Uncorrelated random effects When correlated version won’t converge

Key functions:

lmer(formula, data)      # Fit linear mixed model
summary(model)           # View full output
fixef(model)             # Extract fixed effects
ranef(model)             # Extract random effects (deviations)
coef(model)              # Extract coefficients per group
predict(model)           # Get fitted values
AIC(model1, model2)      # Compare model fit
anova(model1, model2)    # Likelihood ratio test

Summary

  • Regular regression treats all observations as independent - wrong for repeated measures
  • Mixed effects models account for the correlation within subjects
  • Random intercepts (1 | Subject): Subjects differ in their baseline
  • Random slopes (1 + x | Subject): Subjects differ in how they respond to x
  • The sleepstudy data shows both: people start at different baselines AND deteriorate at different rates
  • Use lmer() from lme4 for continuous outcomes
  • Compare models with AIC() and anova()
  • Always visualize individual trajectories to understand your data

Frequently Asked Questions

Q: What’s the difference between mixed effects models and repeated measures ANOVA?

Both handle repeated measurements, but mixed effects models are more flexible. They handle unbalanced designs (missing data), continuous predictors, and can model how the effect itself varies across subjects. Repeated measures ANOVA is a special case of mixed models.

Q: How many subjects do I need for a mixed effects model?

A common rule of thumb is at least 5-10 groups (subjects) for random intercepts, and more for random slopes. With too few groups, random effects estimates become unstable. If you have only 3-4 subjects, consider treating subject as a fixed effect instead.

Q: How many measurements per subject do I need?

For random intercepts only, even 2-3 measurements per subject can work. For random slopes, you need enough observations per subject to estimate individual slopes - typically 4-5+ time points. In the sleepstudy data, 10 measurements per subject allows reliable estimation of both intercepts and slopes. More observations per subject = better individual-level estimates; more subjects = better population-level estimates.

Q: My model won’t converge. What should I do?

Convergence issues usually mean your model is too complex for your data. Try:

  1. Remove random slopes and keep only random intercepts
  2. Remove correlation between random effects: (1 | Subject) + (0 + Days | Subject)
  3. Scale your predictors using scale()
  4. Check for outliers or data entry errors

Q: Should I use lme4 or nlme for mixed models?

lme4 is generally preferred for modern analyses. It’s faster, handles crossed random effects better, and integrates well with the tidyverse. Use nlme if you need specific correlation structures or heterogeneous variances.

Q: How do I get p-values for fixed effects?

lme4 deliberately doesn’t provide p-values because calculating degrees of freedom for mixed models is complex. Options include:

  • Use the lmerTest package which adds p-values via Satterthwaite approximation
  • Use likelihood ratio tests with anova()
  • Use confidence intervals: if 95% CI doesn’t include zero, effect is “significant”

Q: Can I use mixed models for binary outcomes?

Yes! Use glmer() instead of lmer() with family = binomial. For example: glmer(success ~ treatment + (1 | subject), family = binomial, data = df). See our logistic regression guide for background on binary outcomes.


Further Reading