dplyr arrange: Sort rows by one or more variables
Introduction
The arrange() function from dplyr allows you to sort your data by one or more columns in ascending or descending order. This is essential for organizing your data for analysis, creating ordered reports, or preparing data for visualization where order matters.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We have penguin data that appears in random order, but we want to sort it by body mass to identify the lightest and heaviest penguins. Let’s start with simple sorting operations.
Step 1: Sort by single variable (ascending)
We’ll arrange penguins from lightest to heaviest by body mass.
penguins |>
arrange(body_mass_g) |>
head()This shows the lightest penguins first, with body mass arranged in ascending order by default.
Step 2: Sort by single variable (descending)
To see the heaviest penguins first, we use desc() to reverse the order.
penguins |>
arrange(desc(body_mass_g)) |>
head()Now the data shows heaviest penguins first, making it easy to identify the largest specimens.
Step 3: Sort by multiple variables
We can sort by species first, then by body mass within each species.
penguins |>
arrange(species, body_mass_g) |>
head(10)This groups penguins by species alphabetically, then sorts by body mass within each species group.
Example 2: Practical Application
The Problem
A researcher needs to create a report showing penguin measurements organized by island and sex, with the largest penguins listed first within each group. This requires multi-level sorting for a comprehensive overview.
Step 1: Clean and prepare the data
First, we’ll remove any rows with missing values that might interfere with sorting.
clean_penguins <- penguins |>
filter(!is.na(body_mass_g),
!is.na(sex),
!is.na(flipper_length_mm))
clean_penguins |> nrow()This gives us a clean dataset with complete measurements for our analysis.
Step 2: Multi-level sorting with mixed order
We’ll sort by island, then sex, then by body mass (heaviest first) within each group.
sorted_penguins <- clean_penguins |>
arrange(island, sex, desc(body_mass_g))
sorted_penguins |>
head(12)This creates a hierarchical organization perfect for reporting, with clear groupings and meaningful order.
Step 3: Create summary of top penguins per group
Now we can easily identify the largest penguin of each sex on each island.
top_penguins <- sorted_penguins |>
group_by(island, sex) |>
slice_head(n = 1) |>
select(island, sex, species, body_mass_g, flipper_length_mm)
top_penguinsThis gives us the heaviest male and female penguin from each island, thanks to our sorting.
Step 4: Verify sorting with a different approach
Let’s double-check our results by sorting the same data with different criteria.
penguins |>
arrange(flipper_length_mm, bill_length_mm) |>
select(species, flipper_length_mm, bill_length_mm, body_mass_g) |>
head(8)This alternative sorting by flipper and bill length shows how arrange() handles multiple numeric variables smoothly.
Summary
- Use
arrange()to sort data by one or more columns, with ascending order as default - Apply
desc()around column names to sort in descending order
- Multiple variables in
arrange()create hierarchical sorting (first variable, then second, etc.) - Sorting works with both numeric and character variables, making it versatile for different data types
Always consider removing missing values with
filter()before sorting to avoid unexpected results