How to use summarize() in R

dplyr
dplyr summarize()
Learn how to use summarize() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 20, 2026

Introduction

The summarize() function in R’s dplyr package is a powerful tool for creating summary statistics from your data. It allows you to collapse multiple rows of data into a single row containing computed summary values like means, medians, counts, and standard deviations.

You’ll use summarize() when you need to calculate aggregate statistics for entire datasets or specific groups within your data. It’s particularly useful for exploratory data analysis, creating reports, and preparing data for visualization. The function works seamlessly with other dplyr verbs and is essential for data analysis workflows where you need to transform detailed observations into meaningful summaries.

Getting Started

First, let’s load the required packages. We’ll use the tidyverse for data manipulation and the palmerpenguins dataset for our examples.

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the penguins dataset. We’ll calculate basic summary statistics for penguin body mass.

# Basic summarize usage
penguins |>
  summarize(
    count = n(),
    mean_mass = mean(body_mass_g, na.rm = TRUE),
    median_mass = median(body_mass_g, na.rm = TRUE),
    sd_mass = sd(body_mass_g, na.rm = TRUE)
  )

You can also summarize multiple variables at once:

# Summarizing multiple variables
penguins |>
  summarize(
    n_penguins = n(),
    avg_bill_length = mean(bill_length_mm, na.rm = TRUE),
    avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE),
    avg_flipper_length = mean(flipper_length_mm, na.rm = TRUE)
  )

The na.rm = TRUE argument ensures that missing values are excluded from calculations, which is important when working with real-world data that may contain gaps.

Example 2: Practical Application

Now let’s explore a more practical application by combining summarize() with group_by() to analyze differences between penguin species and islands.

# Grouped summary by species
species_summary <- penguins |>
  group_by(species) |>
  summarize(
    count = n(),
    avg_body_mass = mean(body_mass_g, na.rm = TRUE),
    avg_bill_length = mean(bill_length_mm, na.rm = TRUE),
    mass_sd = sd(body_mass_g, na.rm = TRUE),
    .groups = "drop"
  )

For a more complex analysis, let’s examine the data by both species and sex:

# Multi-level grouping with filtering
detailed_summary <- penguins |>
  filter(!is.na(sex)) |>
  group_by(species, sex) |>
  summarize(
    n_observations = n(),
    mean_mass = mean(body_mass_g, na.rm = TRUE),
    min_mass = min(body_mass_g, na.rm = TRUE),
    max_mass = max(body_mass_g, na.rm = TRUE),
    mass_range = max_mass - min_mass,
    .groups = "drop"
  ) |>
  arrange(species, desc(mean_mass))

You can also use summarize() with conditional logic using across() for multiple columns:

# Using across() for multiple numeric columns
penguins |>
  group_by(island) |>
  summarize(
    across(c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
           list(mean = ~mean(.x, na.rm = TRUE),
                sd = ~sd(.x, na.rm = TRUE)),
           .names = "{.col}_{.fn}"),
    sample_size = n(),
    .groups = "drop"
  )

Summary

The summarize() function is an essential tool for data analysis in R. Key takeaways include:

  • Use summarize() to calculate aggregate statistics and reduce your data to summary values
  • Always include na.rm = TRUE when working with functions like mean() and sd() to handle missing values properly
  • Combine with group_by() to create summaries for different categories in your data
  • The .groups = "drop" argument prevents unwanted grouping in your output
  • Use across() within summarize() to apply the same functions to multiple columns efficiently
  • Modern pipe syntax (|>) makes your code more readable and easier to follow

Whether you’re conducting exploratory data analysis or preparing summary reports, summarize() provides a clean, intuitive way to transform detailed observations into meaningful insights.