How to count unique values with n_distinct() in R

dplyr

dplyr n_distinct()

Learn count unique values with n_distinct() in r with clear examples and explanations.

Published

March 26, 2026

Introduction

The n_distinct() function in dplyr is a powerful tool for counting the number of unique values in a vector or dataset. This function is essential for data exploration and quality checks, helping you quickly understand the diversity of your data. You’ll find it particularly useful when examining categorical variables, checking for duplicates, or summarizing data by groups.

Setup

Let’s start by loading the tidyverse package and creating some sample data to work with:

library(tidyverse)

# Create sample data with some duplicate IDs and a missing value
df <- tibble(
  id = c(2, 4, 1, 2, 3, 4, NA),
  amount = c(250, 200, 150, 250, 300, 120, 200)
)

df

Our dataset contains 7 rows but notice that some ID values are repeated, and we have one missing value (NA).

Basic Usage of n_distinct()

The most common way to count distinct values is to use n_distinct() with pull() to extract a column:

df |>
  pull(id) |> 
  n_distinct()

This counts all unique values including NA, so we get 5 distinct values (1, 2, 3, 4, and NA).

Handling Missing Values

To exclude missing values from the count, use the na.rm parameter:

df |>
  pull(id) |> 
  n_distinct(na.rm = TRUE)

Now we get 4 distinct values, excluding the NA.

Alternative Approaches

You can achieve the same result using unique() and length():

df |>
  pull(id) |> 
  unique() |>
  length()

This approach first gets unique values, then counts them. However, n_distinct() is more concise and handles missing values more explicitly.

Direct Column Access

You can also use n_distinct() directly on a column without pipes:

n_distinct(df$id)

This base R syntax is shorter for simple cases but doesn’t integrate as well with dplyr workflows.

Counting Distinct Rows

When applied to an entire dataframe, n_distinct() counts unique combinations of all columns:

df |>
  n_distinct()

This tells us how many completely unique rows exist in our dataset.

Using n_distinct() with Group Operations

One of the most powerful applications is counting distinct values within groups:

# Example with grouped data
df_grouped <- tibble(
  category = c("A", "A", "B", "B", "B"),
  value = c(10, 20, 10, 30, 20)
)

df_grouped |>
  group_by(category) |>
  summarize(distinct_values = n_distinct(value))

This shows how many distinct values exist within each category, which is invaluable for understanding data distribution across groups.

Summary

The n_distinct() function is an essential tool for exploratory data analysis in R. Use it to quickly count unique values in columns, check data quality, and summarize categorical variables. Remember to use na.rm = TRUE when you want to exclude missing values from your counts. Combined with group_by(), it becomes even more powerful for understanding patterns within different subsets of your data.

--- title: "How to count unique values with n_distinct() in R" description: "Learn count unique values with n_distinct() in r with clear examples and explanations." date: 2026-03-26 categories: ['dplyr', 'dplyr n_distinct()'] format: html: code-fold: false code-tools: true --- ## Introduction The `n_distinct()` function in dplyr is a powerful tool for counting the number of unique values in a vector or dataset. This function is essential for data exploration and quality checks, helping you quickly understand the diversity of your data. You'll find it particularly useful when examining categorical variables, checking for duplicates, or summarizing data by groups. ## Setup Let's start by loading the tidyverse package and creating some sample data to work with: ```r library(tidyverse) ``` ```r # Create sample data with some duplicate IDs and a missing value df <- tibble( id = c(2, 4, 1, 2, 3, 4, NA), amount = c(250, 200, 150, 250, 300, 120, 200) ) df ``` Our dataset contains 7 rows but notice that some ID values are repeated, and we have one missing value (NA). ## Basic Usage of n_distinct() The most common way to count distinct values is to use `n_distinct()` with `pull()` to extract a column: ```r df |> pull(id) |> n_distinct() ``` This counts all unique values including NA, so we get 5 distinct values (1, 2, 3, 4, and NA). ## Handling Missing Values To exclude missing values from the count, use the `na.rm` parameter: ```r df |> pull(id) |> n_distinct(na.rm = TRUE) ``` Now we get 4 distinct values, excluding the NA. ## Alternative Approaches You can achieve the same result using `unique()` and `length()`: ```r df |> pull(id) |> unique() |> length() ``` This approach first gets unique values, then counts them. However, `n_distinct()` is more concise and handles missing values more explicitly. ## Direct Column Access You can also use `n_distinct()` directly on a column without pipes: ```r n_distinct(df$id) ``` This base R syntax is shorter for simple cases but doesn't integrate as well with dplyr workflows. ## Counting Distinct Rows When applied to an entire dataframe, `n_distinct()` counts unique combinations of all columns: ```r df |> n_distinct() ``` This tells us how many completely unique rows exist in our dataset. ## Using n_distinct() with Group Operations One of the most powerful applications is counting distinct values within groups: ```r # Example with grouped data df_grouped <- tibble( category = c("A", "A", "B", "B", "B"), value = c(10, 20, 10, 30, 20) ) df_grouped |> group_by(category) |> summarize(distinct_values = n_distinct(value)) ``` This shows how many distinct values exist within each category, which is invaluable for understanding data distribution across groups. ## Summary The `n_distinct()` function is an essential tool for exploratory data analysis in R. Use it to quickly count unique values in columns, check data quality, and summarize categorical variables. Remember to use `na.rm = TRUE` when you want to exclude missing values from your counts. Combined with `group_by()`, it becomes even more powerful for understanding patterns within different subsets of your data.