dplyr arrange: Sort rows by one or more variables

dplyr arrange()

Learn dplyr arrange to sort rows by one or more variables. Practical R tutorial with clear examples.

Published

January 5, 2023

Introduction

The arrange() function from dplyr allows you to sort your data by one or more columns in ascending or descending order. This is essential for organizing your data for analysis, creating ordered reports, or preparing data for visualization where order matters.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We have penguin data that appears in random order, but we want to sort it by body mass to identify the lightest and heaviest penguins. Let’s start with simple sorting operations.

Step 1: Sort by single variable (ascending)

We’ll arrange penguins from lightest to heaviest by body mass.

penguins |>
  arrange(body_mass_g) |>
  head()

This shows the lightest penguins first, with body mass arranged in ascending order by default.

Step 2: Sort by single variable (descending)

To see the heaviest penguins first, we use desc() to reverse the order.

penguins |>
  arrange(desc(body_mass_g)) |>
  head()

Now the data shows heaviest penguins first, making it easy to identify the largest specimens.

Step 3: Sort by multiple variables

We can sort by species first, then by body mass within each species.

penguins |>
  arrange(species, body_mass_g) |>
  head(10)

This groups penguins by species alphabetically, then sorts by body mass within each species group.

Example 2: Practical Application

The Problem

A researcher needs to create a report showing penguin measurements organized by island and sex, with the largest penguins listed first within each group. This requires multi-level sorting for a comprehensive overview.

Step 1: Clean and prepare the data

First, we’ll remove any rows with missing values that might interfere with sorting.

clean_penguins <- penguins |>
  filter(!is.na(body_mass_g), 
         !is.na(sex), 
         !is.na(flipper_length_mm))

clean_penguins |> nrow()

This gives us a clean dataset with complete measurements for our analysis.

Step 2: Multi-level sorting with mixed order

We’ll sort by island, then sex, then by body mass (heaviest first) within each group.

sorted_penguins <- clean_penguins |>
  arrange(island, sex, desc(body_mass_g))

sorted_penguins |>
  head(12)

This creates a hierarchical organization perfect for reporting, with clear groupings and meaningful order.

Step 3: Create summary of top penguins per group

Now we can easily identify the largest penguin of each sex on each island.

top_penguins <- sorted_penguins |>
  group_by(island, sex) |>
  slice_head(n = 1) |>
  select(island, sex, species, body_mass_g, flipper_length_mm)

top_penguins

This gives us the heaviest male and female penguin from each island, thanks to our sorting.

Step 4: Verify sorting with a different approach

Let’s double-check our results by sorting the same data with different criteria.

penguins |>
  arrange(flipper_length_mm, bill_length_mm) |>
  select(species, flipper_length_mm, bill_length_mm, body_mass_g) |>
  head(8)

This alternative sorting by flipper and bill length shows how arrange() handles multiple numeric variables smoothly.

Summary

Use arrange() to sort data by one or more columns, with ascending order as default
Apply desc() around column names to sort in descending order
Multiple variables in arrange() create hierarchical sorting (first variable, then second, etc.)
Sorting works with both numeric and character variables, making it versatile for different data types
Always consider removing missing values with filter() before sorting to avoid unexpected results

--- title: "dplyr arrange: Sort rows by one or more variables" description: "Learn dplyr arrange to sort rows by one or more variables. Practical R tutorial with clear examples." date: 2023-01-05 categories: ['dplyr arrange()'] format: html: code-fold: false code-tools: true --- ## Introduction The [`arrange()`](/dplyr/how-to-use-arrange-in-r.html) function from dplyr allows you to sort your data by one or more columns in ascending or descending order. This is essential for organizing your data for analysis, creating ordered reports, or preparing data for visualization where order matters. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem We have penguin data that appears in random order, but we want to sort it by body mass to identify the lightest and heaviest penguins. Let's start with simple sorting operations. ### Step 1: Sort by single variable (ascending) We'll arrange penguins from lightest to heaviest by body mass. ```r penguins |> arrange(body_mass_g) |> head() ``` This shows the lightest penguins first, with body mass arranged in ascending order by default. ### Step 2: Sort by single variable (descending) To see the heaviest penguins first, we use `desc()` to reverse the order. ```r penguins |> arrange(desc(body_mass_g)) |> head() ``` Now the data shows heaviest penguins first, making it easy to identify the largest specimens. ### Step 3: Sort by multiple variables We can sort by species first, then by body mass within each species. ```r penguins |> arrange(species, body_mass_g) |> head(10) ``` This groups penguins by species alphabetically, then sorts by body mass within each species group. ## Example 2: Practical Application ### The Problem A researcher needs to create a report showing penguin measurements organized by island and sex, with the largest penguins listed first within each group. This requires multi-level sorting for a comprehensive overview. ### Step 1: Clean and prepare the data First, we'll remove any rows with missing values that might interfere with sorting. ```r clean_penguins <- penguins |> filter(!is.na(body_mass_g), !is.na(sex), !is.na(flipper_length_mm)) clean_penguins |> nrow() ``` This gives us a clean dataset with complete measurements for our analysis. ### Step 2: Multi-level sorting with mixed order We'll sort by island, then sex, then by body mass (heaviest first) within each group. ```r sorted_penguins <- clean_penguins |> arrange(island, sex, desc(body_mass_g)) sorted_penguins |> head(12) ``` This creates a hierarchical organization perfect for reporting, with clear groupings and meaningful order. ### Step 3: Create summary of top penguins per group Now we can easily identify the largest penguin of each sex on each island. ```r top_penguins <- sorted_penguins |> group_by(island, sex) |> slice_head(n = 1) |> select(island, sex, species, body_mass_g, flipper_length_mm) top_penguins ``` This gives us the heaviest male and female penguin from each island, thanks to our sorting. ### Step 4: Verify sorting with a different approach Let's double-check our results by sorting the same data with different criteria. ```r penguins |> arrange(flipper_length_mm, bill_length_mm) |> select(species, flipper_length_mm, bill_length_mm, body_mass_g) |> head(8) ``` This alternative sorting by flipper and bill length shows how `arrange()` handles multiple numeric variables smoothly. ## Summary - Use `arrange()` to sort data by one or more columns, with ascending order as default - Apply `desc()` around column names to sort in descending order - Multiple variables in `arrange()` create hierarchical sorting (first variable, then second, etc.) - Sorting works with both numeric and character variables, making it versatile for different data types - Always consider removing missing values with [`filter()`](/dplyr/how-to-use-filter-in-r.html) before sorting to avoid unexpected results --- ## Related Posts - [dplyr n_distinct(): count unique elements or rows](/dplyr/dplyr-n_distinct-count-unique-combinations.html) - [How to rename one or more columns of a dataframe](/dplyr/rename-one-or-more-columns-of-a-dataframe.html) - [dplyr's anti_join() to find rows based on presence or absence in a dataframe](/dplyr/dplyrs-anti_join-to-unmatched-rows.html) - [tidyr unite(): combine multiple columns into one](/tidyr/tidyr-unite-combine-multiple-columns-into-one.html) - [expand_grid(): Create all possible combinations of variables](/tidyr/expand_grid-create-all-possible-combinations-of-variables.html)

Introduction

Getting Started

Example 1: Basic Usage

The Problem

Step 1: Sort by single variable (ascending)

Step 2: Sort by single variable (descending)

Step 3: Sort by multiple variables

Example 2: Practical Application

The Problem

Step 1: Clean and prepare the data

Step 2: Multi-level sorting with mixed order

Step 3: Create summary of top penguins per group

Step 4: Verify sorting with a different approach

Summary

Always consider removing missing values with filter() before sorting to avoid unexpected results

Related Posts

Always consider removing missing values with `filter()` before sorting to avoid unexpected results