How To find levels of a factor in R
Introduction
Factors are a fundamental data type in R used to represent categorical data with predefined categories called “levels.” Understanding how to find and examine factor levels is essential for data analysis, especially when working with categorical variables like survey responses, treatment groups, or classification categories.
You’ll need to find factor levels when exploring new datasets, preparing data for statistical modeling, creating visualizations with specific ordering, or validating data quality. Factor levels determine how categorical data is stored, displayed, and analyzed, making it crucial to understand what levels exist in your factors and how they’re ordered.
This tutorial will show you various methods to identify factor levels using base R functions and tidyverse approaches.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The most straightforward way to find factor levels is using the levels() function. Let’s start with a simple example using the penguins dataset:
data(penguins)
species_factor <- factor(penguins$species)
levels(species_factor)
island_factor <- factor(penguins$island)
levels(island_factor)You can also use str() to see the structure of a factor, which includes its levels:
str(species_factor)
str(island_factor)For a quick overview of all levels and their frequencies, use summary():
summary(species_factor)
summary(island_factor)To check the number of levels in a factor, use nlevels():
nlevels(species_factor)
nlevels(island_factor)Example 2: Practical Application
In real-world scenarios, you often need to examine multiple factors simultaneously or work with factors as part of a larger data pipeline. Here’s how to find levels using tidyverse functions:
penguins_clean <- penguins |>
drop_na() |>
mutate(
species = factor(species),
island = factor(island),
sex = factor(sex),
size_category = factor(case_when(
body_mass_g < 3500 ~ "Small",
body_mass_g < 4500 ~ "Medium",
body_mass_g >= 4500 ~ "Large"
), levels = c("Small", "Medium", "Large"))
)
penguins_clean |>
select(where(is.factor)) |>
map(levels)You can also examine factor levels within grouped operations:
penguins_clean |>
group_by(species) |>
summarise(
islands_present = list(unique(as.character(island))),
n_islands = n_distinct(island),
sex_levels = list(levels(sex)),
.groups = "drop"
)To check if specific levels exist in your factors:
target_species <- c("Adelie", "Chinstrap", "Gentoo", "Emperor")
penguins_clean |>
summarise(
species_levels = list(levels(species)),
has_emperor = "Emperor" %in% levels(species),
missing_levels = list(setdiff(target_species, levels(species)))
)For data validation, you might want to compare expected levels with actual levels:
expected_islands <- c("Biscoe", "Dream", "Torgersen")
penguins_clean |>
pull(island) |>
levels() |>
setequal(expected_islands)