How to use pull() in R

dplyr

dplyr pull()

The dplyr::pull() function extracts a single column from a data frame as a vector. Unlike bracket notation or the $ operator, pull() integrates seamlessly wi…

Published

February 21, 2026

1. Introduction

The dplyr::pull() function extracts a single column from a data frame as a vector. Unlike bracket notation or the $ operator, pull() integrates seamlessly with the tidyverse workflow and pipe operations. This function is particularly useful when you need to convert a column into a vector for further analysis, create lists, or pass values to functions that expect vectors rather than data frames.

You would use pull() when you want to extract values from a data frame column while maintaining the tidyverse coding style, especially within pipe chains. It’s part of the dplyr package, which is included in the tidyverse collection of packages. The function is essential for bridging the gap between data frame manipulation and vector-based operations in R.

2. Syntax

pull(.data, var = -1, name = NULL, ...)

Key arguments: - .data: A data frame or tibble - var: Variable to extract (column name, position, or expression). Defaults to -1 (last column) - name: Optional column to use for naming the vector elements - ...: Additional arguments passed to methods

3. Example 1: Basic Usage

library(tidyverse)
library(palmerpenguins)

# Extract bill_length_mm as a vector
bill_lengths <- penguins |> 
  pull(bill_length_mm)

# Display first few values
head(bill_lengths)

[1] 39.1 39.5 40.3   NA 36.7 39.3

# Check the class
class(bill_lengths)

[1] "numeric"

This example demonstrates the basic functionality of pull(). We extracted the bill_length_mm column from the penguins dataset and converted it into a numeric vector. Notice that pull() preserves the original data type and includes NA values, making it perfect for statistical operations that require vectors.

4. Example 2: Practical Application

# Calculate species-specific body mass statistics
species_stats <- penguins |> 
  filter(!is.na(body_mass_g)) |> 
  group_by(species) |> 
  summarise(
    mean_mass = mean(body_mass_g),
    median_mass = median(body_mass_g),
    .groups = 'drop'
  )

# Extract mean masses for correlation analysis
mean_masses <- species_stats |> 
  pull(mean_mass, name = species)

print(mean_masses)

   Adelie Chinstrap    Gentoo 
 3700.662  3733.088  5076.016

# Use the named vector in further analysis
max_species <- names(mean_masses)[which.max(mean_masses)]
cat("Species with highest mean body mass:", max_species)

Species with highest mean body mass: Gentoo

This practical example shows how pull() with the name argument creates a named vector, which is incredibly useful for further analysis. The named vector maintains the relationship between species and their mean body mass, making downstream operations more intuitive.

5. Example 3: Advanced Usage

# Using pull() with column positions and expressions
penguins_clean <- penguins |> 
  filter(!is.na(bill_length_mm), !is.na(bill_depth_mm))

# Extract last column by position
last_column <- penguins_clean |> 
  pull(-1)  # -1 refers to last column

head(last_column)

[1] 2007 2007 2007 2007 2007 2007

# Extract using computed expressions
bill_ratio <- penguins_clean |> 
  mutate(ratio = bill_length_mm / bill_depth_mm) |> 
  pull(ratio)

# Performance tip: pull() is memory efficient for large datasets
# compared to creating intermediate data frames
large_vector <- penguins |> 
  filter(species == "Adelie") |> 
  pull(body_mass_g)

length(large_vector)

[1] 152

The advanced usage demonstrates pull()’s flexibility with column positions and computed columns. Using negative indexing (-1) extracts the last column, while pull() can extract columns created in the same pipeline, making it highly efficient for data processing workflows.

6. Common Mistakes

Mistake 1: Confusing pull() with select()

# Wrong - returns a data frame
penguins |> select(species)

# Correct - returns a vector
penguins |> pull(species)

Mistake 2: Forgetting to handle NA values

# This might cause issues in downstream functions
masses <- penguins |> pull(body_mass_g)
mean(masses)  # Returns NA due to missing values

# Better approach
masses <- penguins |> 
  filter(!is.na(body_mass_g)) |> 
  pull(body_mass_g)
mean(masses)

Mistake 3: Incorrect use of the name argument

# Wrong - using a non-existent column for names
penguins |> pull(body_mass_g, name = nonexistent_column)

# Correct - using an existing column
penguins |> pull(body_mass_g, name = species)

--- title: "How to use pull() in R" description: "The dplyr::pull() function extracts a single column from a data frame as a vector. Unlike bracket notation or the $ operator, pull() integrates seamlessly wi..." date: 2026-02-21 categories: ["dplyr", "dplyr pull()"] format: html: code-fold: false code-tools: true --- ## 1. Introduction The `dplyr::pull()` function extracts a single column from a data frame as a vector. Unlike bracket notation or the `$` operator, `pull()` integrates seamlessly with the tidyverse workflow and pipe operations. This function is particularly useful when you need to convert a column into a vector for further analysis, create lists, or pass values to functions that expect vectors rather than data frames. You would use `pull()` when you want to extract values from a data frame column while maintaining the tidyverse coding style, especially within pipe chains. It's part of the dplyr package, which is included in the tidyverse collection of packages. The function is essential for bridging the gap between data frame manipulation and vector-based operations in R. ## 2. Syntax ```r pull(.data, var = -1, name = NULL, ...) ``` **Key arguments:** - `.data`: A data frame or tibble - `var`: Variable to extract (column name, position, or expression). Defaults to -1 (last column) - `name`: Optional column to use for naming the vector elements - `...`: Additional arguments passed to methods ## 3. Example 1: Basic Usage ```r library(tidyverse) library(palmerpenguins) # Extract bill_length_mm as a vector bill_lengths <- penguins |> pull(bill_length_mm) # Display first few values head(bill_lengths) ``` ``` [1] 39.1 39.5 40.3 NA 36.7 39.3 ``` ```r # Check the class class(bill_lengths) ``` ``` [1] "numeric" ``` This example demonstrates the basic functionality of `pull()`. We extracted the `bill_length_mm` column from the penguins dataset and converted it into a numeric vector. Notice that `pull()` preserves the original data type and includes NA values, making it perfect for statistical operations that require vectors. ## 4. Example 2: Practical Application ```r # Calculate species-specific body mass statistics species_stats <- penguins |> filter(!is.na(body_mass_g)) |> group_by(species) |> summarise( mean_mass = mean(body_mass_g), median_mass = median(body_mass_g), .groups = 'drop' ) # Extract mean masses for correlation analysis mean_masses <- species_stats |> pull(mean_mass, name = species) print(mean_masses) ``` ``` Adelie Chinstrap Gentoo 3700.662 3733.088 5076.016 ``` ```r # Use the named vector in further analysis max_species <- names(mean_masses)[which.max(mean_masses)] cat("Species with highest mean body mass:", max_species) ``` ``` Species with highest mean body mass: Gentoo ``` This practical example shows how `pull()` with the `name` argument creates a named vector, which is incredibly useful for further analysis. The named vector maintains the relationship between species and their mean body mass, making downstream operations more intuitive. ## 5. Example 3: Advanced Usage ```r # Using pull() with column positions and expressions penguins_clean <- penguins |> filter(!is.na(bill_length_mm), !is.na(bill_depth_mm)) # Extract last column by position last_column <- penguins_clean |> pull(-1) # -1 refers to last column head(last_column) ``` ``` [1] 2007 2007 2007 2007 2007 2007 ``` ```r # Extract using computed expressions bill_ratio <- penguins_clean |> mutate(ratio = bill_length_mm / bill_depth_mm) |> pull(ratio) # Performance tip: pull() is memory efficient for large datasets # compared to creating intermediate data frames large_vector <- penguins |> filter(species == "Adelie") |> pull(body_mass_g) length(large_vector) ``` ``` [1] 152 ``` The advanced usage demonstrates `pull()`'s flexibility with column positions and computed columns. Using negative indexing (-1) extracts the last column, while `pull()` can extract columns created in the same pipeline, making it highly efficient for data processing workflows. ## 6. Common Mistakes **Mistake 1: Confusing `pull()` with [`select()`](how-to-use-select-in-r.html)** ```r # Wrong - returns a data frame penguins |> select(species) # Correct - returns a vector penguins |> pull(species) ``` **Mistake 2: Forgetting to handle NA values** ```r # This might cause issues in downstream functions masses <- penguins |> pull(body_mass_g) mean(masses) # Returns NA due to missing values # Better approach masses <- penguins |> filter(!is.na(body_mass_g)) |> pull(body_mass_g) mean(masses) ``` **Mistake 3: Incorrect use of the `name` argument** ```r # Wrong - using a non-existent column for names penguins |> pull(body_mass_g, name = nonexistent_column) # Correct - using an existing column penguins |> pull(body_mass_g, name = species) ``` ## 7. Related Functions - **`select()`**: Selects columns but returns a data frame instead of a vector - **`pluck()`**: Extracts elements from lists, similar to `pull()` for data frames - **`$` operator**: Base R method to extract columns, but doesn't work well in pipes - **`[[]]` operator**: Base R bracket notation for column extraction - **`deframe()`**: Converts a two-column data frame into a named vector ## Related Tutorials - [Compute rowwise mean and standard deviation](compute-rowwise-mean-and-standard-deviation.html) - [How to apply a function on multiple columns using across()](apply-a-function-on-multiple-columns-using-across.html) - [How to use starts_with() in R](how-to-use-startswith-in-r.html) - [How to add row number within each group in dplyr](add-row-number-within-each-group-in-dplyr.html) - [How to use arrange() in R](how-to-use-arrange-in-r.html)