slice_max: get rows with highest values of a column

dplyr

dplyr slice_max()

Published

October 2, 2023

In this tutorial, we will learn how to get rows with maximum values of a column or variable from a dataframe. For example, from a dataframe with multiple rows and columns we will find a row (or multiple rows) with maximum values for a column.

We will use dplyr’s slice_max() function to select rows with maximum values for a column. We will also use slice_max() function in dplyr to find the top n rows with maximum values for a variable.

dplyr’s slice_max(): Rows with highest values for a column Let us load tidyverse packages and palmer penguins dataset package.

library(tidyverse)
library(palmerpenguins)

We will select a few columns of palmer penguins dataset to easily see how slice_max() works. We will alsoadd row numberto understand slice_max().

penguins %
  drop_na() %>%
  select(species, sex, body_mass_g) %>%
  mutate(row_id = row_number())

Our dataframe looks like this with four columns and over 300 rows.

penguins %>% head()

# A tibble: 6 × 4
  species sex    body_mass_g row_id
               
1 Adelie  male          3750      1
2 Adelie  female        3800      2
3 Adelie  female        3250      3
4 Adelie  female        3450      4
5 Adelie  male          3650      5
6 Adelie  female        3625      6

dpyr’s slice_max(): To get the row with max value for a column

To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164.

penguins %>%
  slice_max(body_mass_g, n =1)

# A tibble: 1 × 4
  species sex   body_mass_g row_id
              
1 Gentoo  male         6300    164

dpyr’s slice_max(): To get the top 2 rows with max values for a column

By changing the value n, we can get top n rows with highest values of the specified column. For example, when we use n = 2 with body mass column, we get the top two rows containing heaviest penguins. We can see that both are male Gentoos.

penguins %>%
  slice_max(body_mass_g, n = 2)

# A tibble: 2 × 4
  species sex   body_mass_g row_id
              
1 Gentoo  male         6300    164
2 Gentoo  male         6050    179

dpyr’s slice_max(): To get the top n rows with max values for a column

Similarly, we can get the top 3 rows with highest values of a column, here body mass, with n = 3. Note that by default it does not break ties, therefore we get four rows with the 3rd and fourth row has the same body mass.

penguins %>%
  slice_max(body_mass_g, n = 3)

# A tibble: 4 × 4
  species sex   body_mass_g row_id
              
1 Gentoo  male         6300    164
2 Gentoo  male         6050    179
3 Gentoo  male         6000    222
4 Gentoo  male         6000    260

dpyr’s slice_max(): To get the top n rows with no ties

We can break ties while use slice_max() with with_ties=FALSE as argumen.

penguins %>%
  slice_max(body_mass_g, n = 3,
            with_ties = FALSE)

# A tibble: 3 × 4
  species sex   body_mass_g row_id
              
1 Gentoo  male         6300    164
2 Gentoo  male         6050    179
3 Gentoo  male         6000    222

Check out how to use dplyr’s slice_min() function to get the bottom n rows for a specific column in a dataframe.

--- title: "slice_max: get rows with highest values of a column" date: 2023-10-02 categories: ['dplyr', 'dplyr slice_max()'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn how to get rows with maximum values of a column or variable from a dataframe. For example, from a dataframe with multiple rows and columns we will find a row (or multiple rows) with maximum values for a column. We will use dplyr's slice_max() function to select rows with maximum values for a column. We will also use slice_max() function in dplyr to find the **top n rows** with maximum values for a variable. ![dplyr's slice_max(): Rows with highest values for a column](https://rstats101.com/wp-content/uploads/2023/10/dplyr_slice_max.png) dplyr's slice_max(): Rows with highest values for a column Let us load tidyverse packages and palmer penguins dataset package. ```r library(tidyverse) library(palmerpenguins) ``` We will select a few columns of palmer penguins dataset to easily see how slice_max() works. We will also[ add row number ](https://rstats101.com/dplyr-row_number-add-unique-row-number-to-a-dataframe/)to understand slice_max(). ```r penguins % drop_na() %>% select(species, sex, body_mass_g) %>% mutate(row_id = row_number()) ``` Our dataframe looks like this with four columns and over 300 rows. ```r penguins %>% head() # A tibble: 6 × 4 species sex body_mass_g row_id 1 Adelie male 3750 1 2 Adelie female 3800 2 3 Adelie female 3250 3 4 Adelie female 3450 4 5 Adelie male 3650 5 6 Adelie female 3625 6 ``` ### dpyr's slice_max(): To get the row with max value for a column To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164. ```r penguins %>% slice_max(body_mass_g, n =1) # A tibble: 1 × 4 species sex body_mass_g row_id 1 Gentoo male 6300 164 ``` ### dpyr's slice_max(): To get the top 2 rows with max values for a column To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164. By changing the value n, we can get top n rows with highest values of the specified column. For example, when we use n = 2 with body mass column, we get the top two rows containing heaviest penguins. We can see that both are male Gentoos. ```r penguins %>% slice_max(body_mass_g, n = 2) # A tibble: 2 × 4 species sex body_mass_g row_id 1 Gentoo male 6300 164 2 Gentoo male 6050 179 ``` ### dpyr's slice_max(): To get the top n rows with max values for a column Similarly, we can get the top 3 rows with highest values of a column, here body mass, with n = 3. Note that by default it does not break ties, therefore we get four rows with the 3rd and fourth row has the same body mass. ```r penguins %>% slice_max(body_mass_g, n = 3) # A tibble: 4 × 4 species sex body_mass_g row_id 1 Gentoo male 6300 164 2 Gentoo male 6050 179 3 Gentoo male 6000 222 4 Gentoo male 6000 260 ``` ### dpyr's slice_max(): To get the top n rows with no ties We can break ties while use slice_max() with with_ties=FALSE as argumen. ```r penguins %>% slice_max(body_mass_g, n = 3, with_ties = FALSE) # A tibble: 3 × 4 species sex body_mass_g row_id 1 Gentoo male 6300 164 2 Gentoo male 6050 179 3 Gentoo male 6000 222 ``` Check out how to use [dplyr's slice_min() function to get the bottom n rows](https://rstats101.com/slice_min-get-rows-with-lowest-values-of-a-column/) for a specific column in a dataframe.