slice_max: get rows with highest values of a column
Introduction
The slice_max() function from dplyr allows you to extract rows with the highest values from a specific column. This is incredibly useful when you need to identify top performers, find maximum values by group, or filter your dataset to show only the most extreme observations.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We want to find the penguin with the longest bill length in our dataset. This will help us understand the maximum bill length observed across all penguins.
Step 1: Examine the data structure
First, let’s look at our penguin dataset to understand what we’re working with.
penguins |>
select(species, bill_length_mm, bill_depth_mm) |>
head()This shows us the first few rows with species and bill measurements.
Step 2: Find the single penguin with longest bill
Now we’ll use slice_max() to get the row with the highest bill length value.
penguins |>
slice_max(bill_length_mm, n = 1)This returns one row containing the penguin with the maximum bill length of 59.6mm.
Step 3: Get multiple top records
We can also retrieve the top 3 penguins with the longest bills.
penguins |>
slice_max(bill_length_mm, n = 3) |>
select(species, bill_length_mm, bill_depth_mm)This gives us the three penguins with bill lengths of 59.6mm, 58.5mm, and 55.9mm respectively.
Example 2: Practical Application
The Problem
A marine biologist wants to identify the heaviest penguin from each species to study potential differences in maximum body size across species. This analysis will help understand species-specific size variations and potential ecological adaptations.
Step 1: Group data by species
We need to group our data by species before finding the maximum values.
penguins |>
group_by(species) |>
slice_max(body_mass_g, n = 1) |>
select(species, body_mass_g, island, sex)This shows us the heaviest penguin from each species: Adelie (4775g), Chinstrap (4800g), and Gentoo (6300g).
Step 2: Handle ties with proportion
Sometimes we want a percentage of top values rather than a fixed number.
penguins |>
group_by(species) |>
slice_max(body_mass_g, prop = 0.1) |>
count(species, name = "top_10_percent_count")This gives us the top 10% heaviest penguins from each species and counts how many that represents.
Step 3: Find top performers across multiple variables
Let’s identify penguins that are both heavy and have long flippers.
penguins |>
mutate(size_score = body_mass_g + flipper_length_mm * 10) |>
slice_max(size_score, n = 5) |>
select(species, body_mass_g, flipper_length_mm, size_score)This creates a combined size score and shows the 5 penguins with the highest combined measurements.
Step 4: Compare with traditional filtering
Here’s how slice_max() differs from traditional filtering approaches.
# Traditional approach - more verbose
max_bill <- max(penguins$bill_length_mm, na.rm = TRUE)
penguins |>
filter(bill_length_mm == max_bill)
# slice_max approach - cleaner
penguins |>
slice_max(bill_length_mm, n = 1)The slice_max() approach is more concise and handles missing values automatically.
Summary
slice_max()efficiently extracts rows with the highest values from a specified column- Use
n =parameter to specify exact number of rows, orprop =for percentage-based selection - Combine with
group_by()to find maximum values within each group - The function automatically handles missing values and is more concise than traditional filtering methods
Perfect for identifying top performers, outliers, or maximum values in your analysis