How to use summarise() in R

dplyr

dplyr summarise()

Learn how to use summarise() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 20, 2026

Introduction

The summarise() function is one of the most powerful tools in R’s dplyr package for data analysis. It allows you to collapse multiple rows of data into summary statistics, creating condensed insights from larger datasets. Whether you’re calculating means, counts, standard deviations, or custom metrics, summarise() transforms your raw data into meaningful summaries.

You’ll use summarise() whenever you need to compute aggregate statistics from your data. Common scenarios include calculating average sales by region, counting observations in different categories, finding the maximum values in groups, or creating custom summary metrics. It’s particularly powerful when combined with group_by() to create summaries for different subsets of your data, making it essential for exploratory data analysis and reporting.

Getting Started

First, let’s load the required packages. We’ll use the tidyverse for data manipulation and the palmerpenguins dataset for our examples.

library(tidyverse)
library(palmerpenguins)

# Take a look at our data
head(penguins)

Example 1: Basic Usage

Let’s start with simple summary statistics using the penguins dataset. The summarise() function creates new columns containing summary statistics.

# Basic summary statistics
penguins |>
  summarise(
    count = n(),
    avg_bill_length = mean(bill_length_mm, na.rm = TRUE),
    avg_body_mass = mean(body_mass_g, na.rm = TRUE),
    max_flipper_length = max(flipper_length_mm, na.rm = TRUE)
  )

You can also use summarise() with a single statistic:

# Single summary statistic
penguins |>
  summarise(total_penguins = n())

The na.rm = TRUE argument is important when working with real data that may contain missing values. Without it, any NA values would cause the summary functions to return NA.

Example 2: Practical Application

The real power of summarise() emerges when combined with group_by(). Let’s analyze penguin characteristics by species and island to understand patterns in the data.

# Grouped summary statistics
penguin_summary <- penguins |>
  group_by(species, island) |>
  summarise(
    count = n(),
    avg_bill_length = mean(bill_length_mm, na.rm = TRUE),
    avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE),
    avg_body_mass = mean(body_mass_g, na.rm = TRUE),
    sd_body_mass = sd(body_mass_g, na.rm = TRUE),
    .groups = "drop"
  )

We can also create more complex summaries with conditional logic and multiple summary functions:

# Advanced summary with conditional statistics
penguins |>
  group_by(species) |>
  summarise(
    total_count = n(),
    male_count = sum(sex == "male", na.rm = TRUE),
    female_count = sum(sex == "female", na.rm = TRUE),
    heavy_penguins = sum(body_mass_g > 4000, na.rm = TRUE),
    bill_length_range = max(bill_length_mm, na.rm = TRUE) - min(bill_length_mm, na.rm = TRUE),
    avg_bill_ratio = mean(bill_length_mm / bill_depth_mm, na.rm = TRUE),
    .groups = "drop"
  )

For yearly analysis, we can extract information from dates and create time-based summaries:

# Summary by year (if year column exists)
penguins |>
  group_by(year, species) |>
  summarise(
    count = n(),
    avg_body_mass = mean(body_mass_g, na.rm = TRUE),
    median_flipper_length = median(flipper_length_mm, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(year, species)

Summary

The summarise() function is essential for data analysis in R, allowing you to transform detailed datasets into meaningful summary statistics. Key takeaways include: always use na.rm = TRUE when dealing with missing values, combine with group_by() for powerful grouped summaries, and use .groups = "drop" to avoid warning messages about grouping structures.

Remember that `summarise()` reduces your data to summary rows, making it perfect for creating reports, identifying patterns, and preparing data for visualization. Master this function alongside `group_by()`, and you’ll have a powerful toolkit for exploratory data analysis.

--- title: "How to use summarise() in R" description: "Learn how to use summarise() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-20 categories: ['dplyr', 'dplyr summarise()'] format: html: code-fold: false code-tools: true --- ## Introduction The `summarise()` function is one of the most powerful tools in R's dplyr package for data analysis. It allows you to collapse multiple rows of data into summary statistics, creating condensed insights from larger datasets. Whether you're calculating means, counts, standard deviations, or custom metrics, `summarise()` transforms your raw data into meaningful summaries. You'll use `summarise()` whenever you need to compute aggregate statistics from your data. Common scenarios include calculating average sales by region, counting observations in different categories, finding the maximum values in groups, or creating custom summary metrics. It's particularly powerful when combined with [`group_by()`](/dplyr/how-to-use-groupby-in-r.html) to create summaries for different subsets of your data, making it essential for exploratory data analysis and reporting. ## Getting Started First, let's load the required packages. We'll use the tidyverse for data manipulation and the palmerpenguins dataset for our examples. ```r library(tidyverse) library(palmerpenguins) # Take a look at our data head(penguins) ``` ## Example 1: Basic Usage Let's start with simple summary statistics using the penguins dataset. The `summarise()` function creates new columns containing summary statistics. ```r # Basic summary statistics penguins |> summarise( count = n(), avg_bill_length = mean(bill_length_mm, na.rm = TRUE), avg_body_mass = mean(body_mass_g, na.rm = TRUE), max_flipper_length = max(flipper_length_mm, na.rm = TRUE) ) ``` You can also use `summarise()` with a single statistic: ```r # Single summary statistic penguins |> summarise(total_penguins = n()) ``` The `na.rm = TRUE` argument is important when working with real data that may contain missing values. Without it, any NA values would cause the summary functions to return NA. ## Example 2: Practical Application The real power of `summarise()` emerges when combined with `group_by()`. Let's analyze penguin characteristics by species and island to understand patterns in the data. ```r # Grouped summary statistics penguin_summary <- penguins |> group_by(species, island) |> summarise( count = n(), avg_bill_length = mean(bill_length_mm, na.rm = TRUE), avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE), avg_body_mass = mean(body_mass_g, na.rm = TRUE), sd_body_mass = sd(body_mass_g, na.rm = TRUE), .groups = "drop" ) ``` We can also create more complex summaries with conditional logic and multiple summary functions: ```r # Advanced summary with conditional statistics penguins |> group_by(species) |> summarise( total_count = n(), male_count = sum(sex == "male", na.rm = TRUE), female_count = sum(sex == "female", na.rm = TRUE), heavy_penguins = sum(body_mass_g > 4000, na.rm = TRUE), bill_length_range = max(bill_length_mm, na.rm = TRUE) - min(bill_length_mm, na.rm = TRUE), avg_bill_ratio = mean(bill_length_mm / bill_depth_mm, na.rm = TRUE), .groups = "drop" ) ``` For yearly analysis, we can extract information from dates and create time-based summaries: ```r # Summary by year (if year column exists) penguins |> group_by(year, species) |> summarise( count = n(), avg_body_mass = mean(body_mass_g, na.rm = TRUE), median_flipper_length = median(flipper_length_mm, na.rm = TRUE), .groups = "drop" ) |> arrange(year, species) ``` ## Summary The `summarise()` function is essential for data analysis in R, allowing you to transform detailed datasets into meaningful summary statistics. Key takeaways include: always use `na.rm = TRUE` when dealing with missing values, combine with `group_by()` for powerful grouped summaries, and use `.groups = "drop"` to avoid warning messages about grouping structures. Remember that `summarise()` reduces your data to summary rows, making it perfect for creating reports, identifying patterns, and preparing data for visualization. Master this function alongside `group_by()`, and you'll have a powerful toolkit for exploratory data analysis. --- ## Related Posts - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html) - [How to use pull() in R](/dplyr/how-to-use-pull-in-r.html) - [How to use separate() in R](/tidyr/how-to-use-separate-in-r.html) - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html)