How to create a nested dataframe with lists

tidyr
tidyr unnest()
Published

November 27, 2024

In this tutorial, we will learn how to create a nested dataframe using nest() function in tidyverse. A nested dataframe is a dataframe where one or more columns are list columns. In a simple dataframe, columns are simple/atomic vectors. However, column can contain other data structures like list, or dataframe. Such columns are called list columns.

library(tidyverse)
packageVersion("dplyr")
[1] '1.1.4'

Let us create a dataframe with group id and group members as two columns.

data 
  group_by(group_id) |>
  summarize(members = list(member))

Our nested dataframe looks like this.

nested

# A tibble: 3 × 2
  group_id members  
         
1 A        
2 B        
3 C        

Here is way to access the values in the list columns

nested$members[[1]]

[1] "John"   "Paul"   "Stella"
nested$members[[2]]

[1] "Paul" "Jake"

We can unnest the nested dataframe and get back the original dataframe using unnest() function.

nested |> unnest()

Warning: `cols` is now required when using `unnest()`.
ℹ Please use `cols = c(members)`.
# A tibble: 7 × 2
  group_id members
        
1 A        John   
2 A        Paul   
3 A        Stella 
4 B        Paul   
5 B        Jake   
6 C        John   
7 C        Mary

Here we clearly specify how to unnest the nested dataframe.

nested |> unnest(members)

# A tibble: 7 × 2
  group_id members
        
1 A        John   
2 A        Paul   
3 A        Stella 
4 B        Paul   
5 B        Jake   
6 C        John   
7 C        Mary

Note that we have not used nest() function to create nested dataframe. With tidyr’s nest() function we can create list columns with tibbles easily.