slice_max: get rows with highest values of a column
In this tutorial, we will learn how to get rows with maximum values of a column or variable from a dataframe. For example, from a dataframe with multiple rows and columns we will find a row (or multiple rows) with maximum values for a column.
We will use dplyr’s slice_max() function to select rows with maximum values for a column. We will also use slice_max() function in dplyr to find the top n rows with maximum values for a variable.
dplyr’s slice_max(): Rows with highest values for a column Let us load tidyverse packages and palmer penguins dataset package.
library(tidyverse)
library(palmerpenguins)We will select a few columns of palmer penguins dataset to easily see how slice_max() works. We will alsoadd row numberto understand slice_max().
penguins %
drop_na() %>%
select(species, sex, body_mass_g) %>%
mutate(row_id = row_number())Our dataframe looks like this with four columns and over 300 rows.
penguins %>% head()
# A tibble: 6 × 4
species sex body_mass_g row_id
1 Adelie male 3750 1
2 Adelie female 3800 2
3 Adelie female 3250 3
4 Adelie female 3450 4
5 Adelie male 3650 5
6 Adelie female 3625 6dpyr’s slice_max(): To get the row with max value for a column
To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164.
penguins %>%
slice_max(body_mass_g, n =1)
# A tibble: 1 × 4
species sex body_mass_g row_id
1 Gentoo male 6300 164dpyr’s slice_max(): To get the top 2 rows with max values for a column
To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164.
By changing the value n, we can get top n rows with highest values of the specified column. For example, when we use n = 2 with body mass column, we get the top two rows containing heaviest penguins. We can see that both are male Gentoos.
penguins %>%
slice_max(body_mass_g, n = 2)
# A tibble: 2 × 4
species sex body_mass_g row_id
1 Gentoo male 6300 164
2 Gentoo male 6050 179dpyr’s slice_max(): To get the top n rows with max values for a column
Similarly, we can get the top 3 rows with highest values of a column, here body mass, with n = 3. Note that by default it does not break ties, therefore we get four rows with the 3rd and fourth row has the same body mass.
penguins %>%
slice_max(body_mass_g, n = 3)
# A tibble: 4 × 4
species sex body_mass_g row_id
1 Gentoo male 6300 164
2 Gentoo male 6050 179
3 Gentoo male 6000 222
4 Gentoo male 6000 260dpyr’s slice_max(): To get the top n rows with no ties
We can break ties while use slice_max() with with_ties=FALSE as argumen.
penguins %>%
slice_max(body_mass_g, n = 3,
with_ties = FALSE)
# A tibble: 3 × 4
species sex body_mass_g row_id
1 Gentoo male 6300 164
2 Gentoo male 6050 179
3 Gentoo male 6000 222Check out how to use dplyr’s slice_min() function to get the bottom n rows for a specific column in a dataframe.