How to use pull() in R

dplyr
dplyr pull()
The dplyr::pull() function extracts a single column from a data frame as a vector. Unlike bracket notation or the $ operator, pull() integrates seamlessly wi…
Published

February 21, 2026

1. Introduction

The dplyr::pull() function extracts a single column from a data frame as a vector. Unlike bracket notation or the $ operator, pull() integrates seamlessly with the tidyverse workflow and pipe operations. This function is particularly useful when you need to convert a column into a vector for further analysis, create lists, or pass values to functions that expect vectors rather than data frames.

You would use pull() when you want to extract values from a data frame column while maintaining the tidyverse coding style, especially within pipe chains. It’s part of the dplyr package, which is included in the tidyverse collection of packages. The function is essential for bridging the gap between data frame manipulation and vector-based operations in R.

2. Syntax

pull(.data, var = -1, name = NULL, ...)

Key arguments: - .data: A data frame or tibble - var: Variable to extract (column name, position, or expression). Defaults to -1 (last column) - name: Optional column to use for naming the vector elements - ...: Additional arguments passed to methods

3. Example 1: Basic Usage

library(tidyverse)
library(palmerpenguins)

# Extract bill_length_mm as a vector
bill_lengths <- penguins |> 
  pull(bill_length_mm)

# Display first few values
head(bill_lengths)
[1] 39.1 39.5 40.3   NA 36.7 39.3
# Check the class
class(bill_lengths)
[1] "numeric"

This example demonstrates the basic functionality of pull(). We extracted the bill_length_mm column from the penguins dataset and converted it into a numeric vector. Notice that pull() preserves the original data type and includes NA values, making it perfect for statistical operations that require vectors.

4. Example 2: Practical Application

# Calculate species-specific body mass statistics
species_stats <- penguins |> 
  filter(!is.na(body_mass_g)) |> 
  group_by(species) |> 
  summarise(
    mean_mass = mean(body_mass_g),
    median_mass = median(body_mass_g),
    .groups = 'drop'
  )

# Extract mean masses for correlation analysis
mean_masses <- species_stats |> 
  pull(mean_mass, name = species)

print(mean_masses)
   Adelie Chinstrap    Gentoo 
 3700.662  3733.088  5076.016 
# Use the named vector in further analysis
max_species <- names(mean_masses)[which.max(mean_masses)]
cat("Species with highest mean body mass:", max_species)
Species with highest mean body mass: Gentoo

This practical example shows how pull() with the name argument creates a named vector, which is incredibly useful for further analysis. The named vector maintains the relationship between species and their mean body mass, making downstream operations more intuitive.

5. Example 3: Advanced Usage

# Using pull() with column positions and expressions
penguins_clean <- penguins |> 
  filter(!is.na(bill_length_mm), !is.na(bill_depth_mm))

# Extract last column by position
last_column <- penguins_clean |> 
  pull(-1)  # -1 refers to last column

head(last_column)
[1] 2007 2007 2007 2007 2007 2007
# Extract using computed expressions
bill_ratio <- penguins_clean |> 
  mutate(ratio = bill_length_mm / bill_depth_mm) |> 
  pull(ratio)

# Performance tip: pull() is memory efficient for large datasets
# compared to creating intermediate data frames
large_vector <- penguins |> 
  filter(species == "Adelie") |> 
  pull(body_mass_g)

length(large_vector)
[1] 152

The advanced usage demonstrates pull()’s flexibility with column positions and computed columns. Using negative indexing (-1) extracts the last column, while pull() can extract columns created in the same pipeline, making it highly efficient for data processing workflows.

6. Common Mistakes

Mistake 1: Confusing pull() with select()

# Wrong - returns a data frame
penguins |> select(species)

# Correct - returns a vector
penguins |> pull(species)

Mistake 2: Forgetting to handle NA values

# This might cause issues in downstream functions
masses <- penguins |> pull(body_mass_g)
mean(masses)  # Returns NA due to missing values

# Better approach
masses <- penguins |> 
  filter(!is.na(body_mass_g)) |> 
  pull(body_mass_g)
mean(masses)

Mistake 3: Incorrect use of the name argument

# Wrong - using a non-existent column for names
penguins |> pull(body_mass_g, name = nonexistent_column)

# Correct - using an existing column
penguins |> pull(body_mass_g, name = species)