How to use lapply in R

base-r
lapply
Master lapply in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices.
Published

February 21, 2026

Introduction

The lapply() function applies a function to each element of a list or vector and returns the results as a list. It’s one of R’s most powerful tools for avoiding loops and processing data efficiently. Use lapply() when you need to perform the same operation on multiple elements and want consistent list output.

Getting Started

library(tidyverse)
data(mtcars)
data(penguins, package = "palmerpenguins")

Example 1: Basic Usage

The Problem

We want to calculate summary statistics for multiple numeric columns in the mtcars dataset. Instead of writing separate code for each column, we need an efficient way to apply the same function across multiple variables.

Step 1: Create a simple list

First, let’s create a basic list to understand how lapply() works.

# Create a simple numeric list
numbers <- list(
  group1 = c(1, 2, 3, 4, 5),
  group2 = c(10, 20, 30),
  group3 = c(100, 200, 300, 400)
)

This creates a list with three named elements, each containing different numeric vectors.

Step 2: Apply a function to each element

Now we’ll use lapply() to calculate the mean of each list element.

# Calculate mean for each group
result <- lapply(numbers, mean)
print(result)

The lapply() function applies the mean() function to each element and returns a list containing the three calculated means.

Step 3: Apply built-in functions

Let’s apply different summary functions to see the versatility of lapply().

# Apply multiple functions
sum_result <- lapply(numbers, sum)
length_result <- lapply(numbers, length)
max_result <- lapply(numbers, max)

Each lapply() call returns a list with the same structure as our input, but containing the calculated values instead of the original data.

Example 2: Practical Application

The Problem

We want to analyze the penguins dataset by calculating summary statistics for numeric columns grouped by species. This is a common data analysis task that requires applying functions to multiple subsets of data efficiently.

Step 1: Prepare the data

First, we’ll split the penguins data by species to create separate datasets.

# Split penguins by species
penguins_clean <- penguins |> 
  filter(!is.na(bill_length_mm))

penguin_groups <- split(penguins_clean, penguins_clean$species)

This creates a list where each element contains all the data for one penguin species.

Step 2: Extract numeric columns

We’ll create a function to extract bill length measurements from each species group.

# Extract bill lengths for each species
get_bill_length <- function(df) {
  return(df$bill_length_mm)
}

bill_lengths <- lapply(penguin_groups, get_bill_length)

Now we have a list containing bill length vectors for each species, ready for further analysis.

Step 3: Calculate statistics

Let’s apply multiple statistical functions to analyze the bill length data.

# Calculate various statistics
mean_bills <- lapply(bill_lengths, mean, na.rm = TRUE)
median_bills <- lapply(bill_lengths, median, na.rm = TRUE)
sd_bills <- lapply(bill_lengths, sd, na.rm = TRUE)

Each function call produces a list with summary statistics for bill lengths across all three penguin species.

Step 4: Create custom analysis function

We can write a custom function to get comprehensive statistics in one step.

# Custom summary function
bill_summary <- function(x) {
  c(mean = mean(x, na.rm = TRUE),
    median = median(x, na.rm = TRUE),
    sd = sd(x, na.rm = TRUE))
}

comprehensive_stats <- lapply(bill_lengths, bill_summary)

This approach returns a list where each element contains multiple statistics, providing a complete summary for each species.

Summary

  • lapply() applies functions to list or vector elements and always returns a list
  • It’s more efficient than writing loops and produces cleaner, more readable code
  • You can use built-in functions like mean(), sum(), or create custom functions for complex operations
  • The function is particularly useful for grouped analysis and repetitive calculations across datasets
  • Remember to handle missing values with parameters like na.rm = TRUE when working with real data