How to use summarise() in R

dplyr
dplyr summarise()
Published

February 20, 2026

R Tutorial: dplyr::summarise()

Introduction

The summarise() function from the dplyr package is used to create summary statistics from data frames. It reduces multiple rows down to a single summary row by applying aggregate functions like mean(), sum(), count(), or max() to columns. This function is essential for data analysis when you need to compute descriptive statistics, create reports, or generate insights from grouped data. The summarise() function is part of the tidyverse ecosystem and works seamlessly with other dplyr functions like group_by() and filter(). It’s particularly powerful when combined with grouping operations, allowing you to calculate statistics for different subsets of your data in a single operation.

Syntax

summarise(.data, ..., .by = NULL, .groups = NULL)

Key Arguments: - .data: A data frame or tibble to summarize - ...: Name-value pairs of summary functions (e.g., mean_height = mean(height)) - .by: Optional grouping columns (alternative to using group_by()) - .groups: How to handle grouping structure in the output (“drop_last”, “drop”, “keep”, “rowwise”)

Example 1: Basic Usage

Let’s start with a simple example using the palmerpenguins dataset:

library(tidyverse)
library(palmerpenguins)

# Basic summary statistics
penguins |>
  summarise(
    count = n(),
    avg_bill_length = mean(bill_length_mm, na.rm = TRUE),
    max_body_mass = max(body_mass_g, na.rm = TRUE),
    min_flipper_length = min(flipper_length_mm, na.rm = TRUE)
  )
# A tibble: 1 × 4
  count avg_bill_length max_body_mass min_flipper_length
  <int>           <dbl>         <int>              <int>
1   344            43.9          6300                172

This example demonstrates the basic functionality of summarise(). We created four summary statistics: total count of observations using n(), average bill length, maximum body mass, and minimum flipper length. The na.rm = TRUE argument handles missing values by excluding them from calculations.

Example 2: Practical Application

Here’s a more practical example that combines summarise() with group_by() to analyze penguin species:

# Species comparison with grouped summaries
species_summary <- penguins |>
  group_by(species, island) |>
  summarise(
    penguin_count = n(),
    avg_bill_length = mean(bill_length_mm, na.rm = TRUE),
    avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE),
    avg_body_mass = mean(body_mass_g, na.rm = TRUE),
    mass_sd = sd(body_mass_g, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(avg_body_mass))

species_summary
# A tibble: 5 × 7
  species   island    penguin_count avg_bill_length avg_bill_depth avg_body_mass mass_sd
  <fct>     <fct>             <int>           <dbl>          <dbl>         <dbl>   <dbl>
1 Gentoo    Biscoe              124            47.5           15.0         5076.    504.
2 Chinstrap Dream                68            48.8           18.4         3733.    384.
3 Adelie    Biscoe               44            39.0           18.4         3710.    488.
4 Adelie    Dream                56            38.5           18.3         3688.    455.
5 Adelie    Torgersen            52            39.0           18.4         3706.    445.

This example shows how summarise() works with grouped data to create comprehensive summaries for each species-island combination. We calculated multiple statistics and used arrange() to sort by average body mass, revealing that Gentoo penguins are the heaviest on average.

Example 3: Advanced Usage

Advanced usage includes using multiple summary functions and conditional summaries:

# Advanced summarise with conditional logic and multiple functions
advanced_summary <- penguins |>
  group_by(species) |>
  summarise(
    total_count = n(),
    complete_cases = sum(!is.na(bill_length_mm) & !is.na(body_mass_g)),
    heavy_penguins = sum(body_mass_g > 4000, na.rm = TRUE),
    pct_heavy = round(heavy_penguins / complete_cases * 100, 1),
    bill_length_range = max(bill_length_mm, na.rm = TRUE) - min(bill_length_mm, na.rm = TRUE),
    mass_quartiles = list(quantile(body_mass_g, na.rm = TRUE)),
    .groups = "drop"
  )

# Extract quartiles for one species
advanced_summary$mass_quartiles[[1]]  # Adelie quartiles
  0%  25%  50%  75% 100% 
2850 3350 3700 4000 4775 

This advanced example demonstrates conditional counting, percentage calculations, range calculations, and storing complex objects like quartiles in list columns.

Common Mistakes

1. Forgetting na.rm = TRUE with missing data:

# Wrong - will return NA if any missing values exist
penguins |> summarise(avg_mass = mean(body_mass_g))

# Correct - handles missing values
penguins |> summarise(avg_mass = mean(body_mass_g, na.rm = TRUE))

2. Not handling grouping properly:

# This creates unexpected grouping behavior
penguins |> 
  group_by(species, sex) |>
  summarise(count = n())  # Warning about grouping

# Better - explicitly control grouping
penguins |> 
  group_by(species, sex) |>
  summarise(count = n(), .groups = "drop")

3. Using summarise() when you need mutate():

# Wrong - summarise() reduces rows, not what we want here
penguins |> summarise(bill_ratio = bill_length_mm / bill_depth_mm)

# Correct - mutate() adds new columns while keeping all rows
penguins |> mutate(bill_ratio = bill_length_mm / bill_depth_mm)