How to Use keep() and discard() in R
purrr
purrr keep()
Filter list elements with purrr’s keep() and discard(). Learn to select or remove elements based on predicate functions.
Introduction
The keep() and discard() functions filter list elements based on a predicate (a function that returns TRUE or FALSE). They’re the purrr equivalent of filtering, but for lists instead of data frame rows.
keep()- keeps elements where the predicate is TRUEdiscard()- removes elements where the predicate is TRUE (keeps FALSE)
Getting Started
library(tidyverse)
library(palmerpenguins)keep(): Select Elements
Basic usage
Keep elements that match a condition:
# Keep only even numbers
numbers <- list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
keep(numbers, \(x) x %% 2 == 0)Keep by type
# Mixed list
mixed <- list(
a = 1:3,
b = "hello",
c = 4:6,
d = "world"
)
# Keep only numeric elements
keep(mixed, is.numeric)Keep by length
# Lists of different lengths
data <- list(
short = 1:2,
medium = 1:5,
long = 1:10,
tiny = 1
)
# Keep elements with more than 3 items
keep(data, \(x) length(x) > 3)discard(): Remove Elements
Remove elements matching a condition
numbers <- list(1, 2, 3, 4, 5, NA, 6, NA, 7)
# Remove NA values
discard(numbers, is.na)Remove empty elements
data <- list(
a = 1:3,
b = character(0), # empty
c = 4:6,
d = NULL, # NULL
e = integer(0) # empty
)
# Discard empty vectors
discard(data, \(x) length(x) == 0)
# Or use is_empty helper
discard(data, is_empty)Practical Example: Data Frame Columns
Keep numeric columns
penguins |>
keep(is.numeric)Discard columns with missing values
# Remove columns that have ANY missing values
penguins |>
discard(\(x) any(is.na(x)))Keep columns above a threshold
# Keep numeric columns with mean > 100
penguins |>
keep(is.numeric) |>
keep(\(x) mean(x, na.rm = TRUE) > 100)Working with Nested Lists
Filter nested data
# Nested list structure
nested_data <- list(
group1 = list(n = 100, values = 1:100),
group2 = list(n = 5, values = 1:5),
group3 = list(n = 50, values = 1:50)
)
# Keep groups with n > 20
keep(nested_data, \(x) x$n > 20)Filter model results
# Fit multiple models
models <- list(
m1 = lm(mpg ~ wt, data = mtcars),
m2 = lm(mpg ~ wt + hp, data = mtcars),
m3 = lm(mpg ~ cyl, data = mtcars)
)
# Keep models with R-squared > 0.8
keep(models, \(m) summary(m)$r.squared > 0.8)compact(): Remove NULL and Empty Elements
A special shortcut for a common operation:
# List with NULLs
data <- list(a = 1, b = NULL, c = 3, d = NULL, e = 5)
# Remove NULLs with compact()
compact(data)
# Equivalent to:
discard(data, is.null)Combining with Other purrr Functions
Filter then transform
# Keep numeric, then calculate stats
penguins |>
keep(is.numeric) |>
map(\(x) list(
mean = mean(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE)
))Conditional processing pipeline
# Process only valid elements
results <- list(
success = list(status = "ok", value = 42),
failure = list(status = "error", value = NULL),
success2 = list(status = "ok", value = 100)
)
# Keep successful, extract values
results |>
keep(\(x) x$status == "ok") |>
map_dbl(\(x) x$value)detect() and detect_index(): Find First Match
Find the first element matching a condition:
x <- c(3, 5, 8, 2, 9, 1)
# Find first element > 6
detect(x, \(i) i > 6) # 8
# Find its position
detect_index(x, \(i) i > 6) # 3Find from the right
# Find last element > 6
detect(x, \(i) i > 6, .dir = "backward") # 9
detect_index(x, \(i) i > 6, .dir = "backward") # 5Practical use: find first valid result
# Find first non-empty result
results <- list(NULL, character(0), "found it!", "also valid")
detect(results, \(x) length(x) > 0 && !is.null(x))
# "found it!"head_while() and tail_while()
Keep elements from start/end while condition is TRUE:
x <- c(1, 2, 3, 10, 11, 12, 4, 5)
# Keep from start while < 10
head_while(x, \(i) i < 10) # 1, 2, 3
# Keep from end while < 10
tail_while(x, \(i) i < 10) # 4, 5Useful for sorted data
# Dates in order
dates <- as.Date(c("2024-01-01", "2024-01-15", "2024-02-01", "2024-03-01"))
# Get dates before February
head_while(dates, \(d) d < as.Date("2024-02-01"))Base R Comparison
# Base R Filter()
Filter(is.numeric, penguins)
# purrr keep() - equivalent
keep(penguins, is.numeric)
# purrr advantage: formula/lambda syntax
keep(penguins, ~ mean(.x, na.rm = TRUE) > 100)
# Base R equivalent is more verbose
Filter(\(x) is.numeric(x) && mean(x, na.rm = TRUE) > 100, penguins)keep_at() and discard_at(): By Position or Name
Filter by position or name instead of predicate:
data <- list(a = 1, b = 2, c = 3, d = 4, e = 5)
# Keep by name
keep_at(data, c("a", "c", "e"))
# Discard by position
discard_at(data, c(2, 4))
# Keep by pattern (using tidyselect)
keep_at(data, starts_with("a"))Common Mistakes
1. Confusing keep/discard with filter
# filter() is for data frames
penguins |> filter(species == "Adelie")
# keep() is for lists
list(1, 2, 3, 4) |> keep(\(x) x > 2)2. Predicate must return single TRUE/FALSE
# This doesn't work - returns vector
# keep(list(1:3, 4:6), \(x) x > 2)
# This works - returns single logical
keep(list(1:3, 4:6), \(x) all(x > 2))3. Not handling NA in predicates
data <- list(a = 1, b = NA, c = 3)
# This might behave unexpectedly
# keep(data, \(x) x > 2) # NA comparison issues
# Handle NAs explicitly
keep(data, \(x) !is.na(x) && x > 2)Summary
| Function | Keeps Elements Where | Use Case |
|---|---|---|
keep() |
predicate is TRUE | Select matching elements |
discard() |
predicate is FALSE | Remove matching elements |
compact() |
element is not NULL | Remove NULLs |
keep_at() |
name/position matches | Select by name/position |
discard_at() |
name/position doesn’t match | Remove by name/position |
keep()anddiscard()are opposites- Use
compact()as a shortcut to remove NULLs - Predicates must return a single TRUE or FALSE
- Chain with
map()for filter-then-transform workflows