dplyr between(): find if numerical values are within a range.

dplyr between()
Master dplyr between() to find if numerical values are within a range.. Complete R tutorial with examples using real datasets.
Published

August 17, 2022

Introduction

The between() function in dplyr is a convenient way to check if numerical values fall within a specified range. It’s particularly useful for filtering data based on numeric conditions and is more readable than writing complex logical expressions with >= and <= operators.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to identify penguins whose body mass falls within a specific range. Let’s find penguins that weigh between 3500 and 4500 grams.

Step 1: Examine the data structure

First, let’s look at the penguin data to understand what we’re working with.

head(penguins)
summary(penguins$body_mass_g)

This shows us the body mass variable and its distribution across all penguins in the dataset.

Step 2: Use between() to filter data

Now we’ll apply the between() function to filter penguins within our target weight range.

medium_penguins <- penguins |>
  filter(between(body_mass_g, 3500, 4500)) |>
  select(species, island, body_mass_g)

head(medium_penguins)

The between() function returns TRUE for values that fall within the inclusive range of 3500 to 4500 grams.

Step 3: Compare with traditional approach

Let’s see how this compares to the traditional method using comparison operators.

# Traditional approach
traditional <- penguins |>
  filter(body_mass_g >= 3500 & body_mass_g <= 4500) |>
  select(species, island, body_mass_g)

# Verify they're identical
identical(medium_penguins, traditional)

Both methods produce identical results, but between() is more concise and readable.

Example 2: Practical Application

The Problem

A marine biologist wants to analyze penguins with moderate bill lengths for a specific study. They need to identify Adelie penguins whose bill length falls between 35mm and 42mm, then calculate summary statistics for this subset.

Step 1: Filter using multiple conditions

We’ll combine between() with other filtering conditions to get our target subset.

target_penguins <- penguins |>
  filter(species == "Adelie",
         between(bill_length_mm, 35, 42)) |>
  drop_na(bill_length_mm)

nrow(target_penguins)

This filters for Adelie penguins with bill lengths in our target range and removes any missing values.

Step 2: Calculate summary statistics

Now let’s compute meaningful statistics for our filtered dataset.

bill_summary <- target_penguins |>
  summarise(
    count = n(),
    mean_bill = mean(bill_length_mm),
    median_bill = median(bill_length_mm),
    min_bill = min(bill_length_mm),
    max_bill = max(bill_length_mm)
  )

print(bill_summary)

This provides a comprehensive overview of bill length characteristics in our filtered subset.

Step 3: Group analysis with between()

Let’s extend our analysis by examining how these moderate bill lengths vary by island.

island_analysis <- target_penguins |>
  group_by(island) |>
  summarise(
    penguin_count = n(),
    avg_bill_length = round(mean(bill_length_mm), 2),
    avg_body_mass = round(mean(body_mass_g, na.rm = TRUE), 0)
  ) |>
  arrange(desc(penguin_count))

print(island_analysis)

This reveals how our target penguins are distributed across different islands and their average characteristics.

Step 4: Create a logical indicator

Sometimes we want to create a new variable indicating whether values fall within a range.

penguins_with_flag <- penguins |>
  mutate(
    moderate_bill = between(bill_length_mm, 35, 42),
    size_category = case_when(
      between(body_mass_g, 2700, 3500) ~ "Small",
      between(body_mass_g, 3501, 4500) ~ "Medium", 
      between(body_mass_g, 4501, 6500) ~ "Large",
      TRUE ~ "Unknown"
    )
  ) |>
  select(species, bill_length_mm, body_mass_g, moderate_bill, size_category)

table(penguins_with_flag$size_category)

This creates both a logical flag and categorical size groups using multiple between() functions.

Summary

  • between() provides a clean, readable way to check if values fall within an inclusive numeric range
  • It’s equivalent to using >= and <= operators but more concise and less error-prone
  • The function works seamlessly with filter() to subset data based on range conditions
  • You can combine between() with other dplyr functions like mutate() and case_when() for complex data transformations
  • between() handles the inclusive bounds automatically, making range-based filtering intuitive and straightforward