dplyr between(): find if numerical values are within a range.
Introduction
The between() function in dplyr is a convenient way to check if numerical values fall within a specified range. It’s particularly useful for filtering data based on numeric conditions and is more readable than writing complex logical expressions with >= and <= operators.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We want to identify penguins whose body mass falls within a specific range. Let’s find penguins that weigh between 3500 and 4500 grams.
Step 1: Examine the data structure
First, let’s look at the penguin data to understand what we’re working with.
head(penguins)
summary(penguins$body_mass_g)This shows us the body mass variable and its distribution across all penguins in the dataset.
Step 2: Use between() to filter data
Now we’ll apply the between() function to filter penguins within our target weight range.
medium_penguins <- penguins |>
filter(between(body_mass_g, 3500, 4500)) |>
select(species, island, body_mass_g)
head(medium_penguins)The between() function returns TRUE for values that fall within the inclusive range of 3500 to 4500 grams.
Step 3: Compare with traditional approach
Let’s see how this compares to the traditional method using comparison operators.
# Traditional approach
traditional <- penguins |>
filter(body_mass_g >= 3500 & body_mass_g <= 4500) |>
select(species, island, body_mass_g)
# Verify they're identical
identical(medium_penguins, traditional)Both methods produce identical results, but between() is more concise and readable.
Example 2: Practical Application
The Problem
A marine biologist wants to analyze penguins with moderate bill lengths for a specific study. They need to identify Adelie penguins whose bill length falls between 35mm and 42mm, then calculate summary statistics for this subset.
Step 1: Filter using multiple conditions
We’ll combine between() with other filtering conditions to get our target subset.
target_penguins <- penguins |>
filter(species == "Adelie",
between(bill_length_mm, 35, 42)) |>
drop_na(bill_length_mm)
nrow(target_penguins)This filters for Adelie penguins with bill lengths in our target range and removes any missing values.
Step 2: Calculate summary statistics
Now let’s compute meaningful statistics for our filtered dataset.
bill_summary <- target_penguins |>
summarise(
count = n(),
mean_bill = mean(bill_length_mm),
median_bill = median(bill_length_mm),
min_bill = min(bill_length_mm),
max_bill = max(bill_length_mm)
)
print(bill_summary)This provides a comprehensive overview of bill length characteristics in our filtered subset.
Step 3: Group analysis with between()
Let’s extend our analysis by examining how these moderate bill lengths vary by island.
island_analysis <- target_penguins |>
group_by(island) |>
summarise(
penguin_count = n(),
avg_bill_length = round(mean(bill_length_mm), 2),
avg_body_mass = round(mean(body_mass_g, na.rm = TRUE), 0)
) |>
arrange(desc(penguin_count))
print(island_analysis)This reveals how our target penguins are distributed across different islands and their average characteristics.
Step 4: Create a logical indicator
Sometimes we want to create a new variable indicating whether values fall within a range.
penguins_with_flag <- penguins |>
mutate(
moderate_bill = between(bill_length_mm, 35, 42),
size_category = case_when(
between(body_mass_g, 2700, 3500) ~ "Small",
between(body_mass_g, 3501, 4500) ~ "Medium",
between(body_mass_g, 4501, 6500) ~ "Large",
TRUE ~ "Unknown"
)
) |>
select(species, bill_length_mm, body_mass_g, moderate_bill, size_category)
table(penguins_with_flag$size_category)This creates both a logical flag and categorical size groups using multiple between() functions.
Summary
between()provides a clean, readable way to check if values fall within an inclusive numeric range- It’s equivalent to using
>=and<=operators but more concise and less error-prone - The function works seamlessly with
filter()to subset data based on range conditions - You can combine
between()with other dplyr functions likemutate()andcase_when()for complex data transformations between()handles the inclusive bounds automatically, making range-based filtering intuitive and straightforward