How To find levels of a factor in R

class() in R

str() in R

Learn how to perform find levels of a factor in R. Step-by-step statistical tutorial with examples.

Published

May 31, 2022

Introduction

Factors are a fundamental data type in R used to represent categorical data with predefined categories called “levels.” Understanding how to find and examine factor levels is essential for data analysis, especially when working with categorical variables like survey responses, treatment groups, or classification categories.

You’ll need to find factor levels when exploring new datasets, preparing data for statistical modeling, creating visualizations with specific ordering, or validating data quality. Factor levels determine how categorical data is stored, displayed, and analyzed, making it crucial to understand what levels exist in your factors and how they’re ordered.

This tutorial will show you various methods to identify factor levels using base R functions and tidyverse approaches.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The most straightforward way to find factor levels is using the levels() function. Let’s start with a simple example using the penguins dataset:

data(penguins)

species_factor <- factor(penguins$species)
levels(species_factor)

island_factor <- factor(penguins$island)
levels(island_factor)

You can also use str() to see the structure of a factor, which includes its levels:

str(species_factor)
str(island_factor)

For a quick overview of all levels and their frequencies, use summary():

summary(species_factor)
summary(island_factor)

To check the number of levels in a factor, use nlevels():

nlevels(species_factor)
nlevels(island_factor)

Example 2: Practical Application

In real-world scenarios, you often need to examine multiple factors simultaneously or work with factors as part of a larger data pipeline. Here’s how to find levels using tidyverse functions:

penguins_clean <- penguins |>
  drop_na() |>
  mutate(
    species = factor(species),
    island = factor(island),
    sex = factor(sex),
    size_category = factor(case_when(
      body_mass_g < 3500 ~ "Small",
      body_mass_g < 4500 ~ "Medium",
      body_mass_g >= 4500 ~ "Large"
    ), levels = c("Small", "Medium", "Large"))
  )

penguins_clean |>
  select(where(is.factor)) |>
  map(levels)

You can also examine factor levels within grouped operations:

penguins_clean |>
  group_by(species) |>
  summarise(
    islands_present = list(unique(as.character(island))),
    n_islands = n_distinct(island),
    sex_levels = list(levels(sex)),
    .groups = "drop"
  )

To check if specific levels exist in your factors:

target_species <- c("Adelie", "Chinstrap", "Gentoo", "Emperor")

penguins_clean |>
  summarise(
    species_levels = list(levels(species)),
    has_emperor = "Emperor" %in% levels(species),
    missing_levels = list(setdiff(target_species, levels(species)))
  )

For data validation, you might want to compare expected levels with actual levels:

expected_islands <- c("Biscoe", "Dream", "Torgersen")

penguins_clean |>
  pull(island) |>
  levels() |>
  setequal(expected_islands)

Summary

Finding factor levels in R is straightforward with several key functions: `levels()` for direct level extraction, `nlevels()` for counting levels, `str()` for structure examination, and `summary()` for level frequencies. When working with tidyverse, combine these functions with `map()` and `where(is.factor)` to examine multiple factors efficiently. Always verify that your factor levels match expectations, especially when preparing data for analysis or modeling, as unexpected or missing levels can lead to analytical errors.

--- title: "How To find levels of a factor in R" description: "Learn how to perform find levels of a factor in R. Step-by-step statistical tutorial with examples." date: 2022-05-31 categories: ['class() in R', 'str() in R'] format: html: code-fold: false code-tools: true --- ## Introduction Factors are a fundamental data type in R used to represent categorical data with predefined categories called "levels." Understanding how to find and examine factor levels is essential for data analysis, especially when working with categorical variables like survey responses, treatment groups, or classification categories. You'll need to find factor levels when exploring new datasets, preparing data for statistical modeling, creating visualizations with specific ordering, or validating data quality. Factor levels determine how categorical data is stored, displayed, and analyzed, making it crucial to understand what levels exist in your factors and how they're ordered. This tutorial will show you various methods to identify factor levels using base R functions and tidyverse approaches. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage The most straightforward way to find factor levels is using the `levels()` function. Let's start with a simple example using the penguins dataset: ```r data(penguins) species_factor <- factor(penguins$species) levels(species_factor) island_factor <- factor(penguins$island) levels(island_factor) ``` You can also use `str()` to see the structure of a factor, which includes its levels: ```r str(species_factor) str(island_factor) ``` For a quick overview of all levels and their frequencies, use `summary()`: ```r summary(species_factor) summary(island_factor) ``` To check the number of levels in a factor, use `nlevels()`: ```r nlevels(species_factor) nlevels(island_factor) ``` ## Example 2: Practical Application In real-world scenarios, you often need to examine multiple factors simultaneously or work with factors as part of a larger data pipeline. Here's how to find levels using tidyverse functions: ```r penguins_clean <- penguins |> drop_na() |> mutate( species = factor(species), island = factor(island), sex = factor(sex), size_category = factor(case_when( body_mass_g < 3500 ~ "Small", body_mass_g < 4500 ~ "Medium", body_mass_g >= 4500 ~ "Large" ), levels = c("Small", "Medium", "Large")) ) penguins_clean |> select(where(is.factor)) |> map(levels) ``` You can also examine factor levels within grouped operations: ```r penguins_clean |> group_by(species) |> summarise( islands_present = list(unique(as.character(island))), n_islands = n_distinct(island), sex_levels = list(levels(sex)), .groups = "drop" ) ``` To check if specific levels exist in your factors: ```r target_species <- c("Adelie", "Chinstrap", "Gentoo", "Emperor") penguins_clean |> summarise( species_levels = list(levels(species)), has_emperor = "Emperor" %in% levels(species), missing_levels = list(setdiff(target_species, levels(species))) ) ``` For data validation, you might want to compare expected levels with actual levels: ```r expected_islands <- c("Biscoe", "Dream", "Torgersen") penguins_clean |> pull(island) |> levels() |> setequal(expected_islands) ``` ## Summary Finding factor levels in R is straightforward with several key functions: `levels()` for direct level extraction, `nlevels()` for counting levels, `str()` for structure examination, and `summary()` for level frequencies. When working with tidyverse, combine these functions with `map()` and `where(is.factor)` to examine multiple factors efficiently. Always verify that your factor levels match expectations, especially when preparing data for analysis or modeling, as unexpected or missing levels can lead to analytical errors. --- ## Related Posts - [How to drop unused level of factor variable in R](/base-r/drop-unused-level-of-factor-variable-in-r.html) - [How to compute annualized return of a stock with tidyverse](/how-to/compute-annualized-return-of-a-stock.html) - [colSums in R - compute sum of all columns in a dataframe or matrix](/how-to/colsums-in-r-compute-sum-of-all-columns-in-a-dataframe-or-matrix.html) - [How to add currency symbols to columns of a table with gt()](/how-to/add-currency-symbols-to-columns-of-a-table.html) - [dplyr count(): count unique values of a variable](/dplyr/dplyr-count-count-unique-values-of-a-variable.html)