How to convert a matrix to a tidy table

pivot_longer()
Published

December 1, 2022

In this tutorial, we will learn how to convert a data matrix to tidy long table. We will use tidyr’s pivot_longer() function to reshape the matrix into a tidy table.

First, let us start with loading the tidyverse suit of packages.

library(tidyverse)

Create a simulated data matrix

Let us create a data matrix using rnorm() function generating random numbers from normal distribution.

set.seed(42)
# Simulate data matrix with 10 rows and 6 columns
mat             
 1  1.37    1.30  -0.307  0.455   0.206  0.322 
 2 -0.565   2.29  -1.78   0.705  -0.361 -0.784 
 3  0.363  -1.39  -0.172  1.04    0.758  1.58  
 4  0.633  -0.279  1.21  -0.609  -0.727  0.643 
 5  0.404  -0.133  1.90   0.505  -1.37   0.0898
 6 -0.106   0.636 -0.430 -1.72    0.433  0.277 
 7  1.51   -0.284 -0.257 -0.784  -0.811  0.679 
 8 -0.0947 -2.66  -1.76  -0.851   1.44   0.0898
 9  2.02   -2.44   0.460 -2.41   -0.431 -2.99  
10 -0.0627  1.32  -0.640  0.0361  0.656  0.285

And then add a column containing row number as a factor variable. We use dplyr’s row_number() function.

as_tibble(mat) %>%
   mutate(row_id=factor(row_number())) 

# A tibble: 10 × 7
        C1     C2     C3      C4     C5      C6 row_id
                   
 1  1.37    1.30  -0.307  0.455   0.206  0.322  1     
 2 -0.565   2.29  -1.78   0.705  -0.361 -0.784  2     
 3  0.363  -1.39  -0.172  1.04    0.758  1.58   3     
 4  0.633  -0.279  1.21  -0.609  -0.727  0.643  4     
 5  0.404  -0.133  1.90   0.505  -1.37   0.0898 5     
 6 -0.106   0.636 -0.430 -1.72    0.433  0.277  6     
 7  1.51   -0.284 -0.257 -0.784  -0.811  0.679  7     
 8 -0.0947 -2.66  -1.76  -0.851   1.44   0.0898 8     
 9  2.02   -2.44   0.460 -2.41   -0.431 -2.99   9     
10 -0.0627  1.32  -0.640  0.0361  0.656  0.285  10

mutate() function creates a new variable at the end of the dataframe, as last column. Here we use relocate() function to move the row ID column to the front.

mat_df %
  mutate(row_id=factor(row_number())) %>%
  relocate(row_id)

mat_df 

# A tibble: 10 × 7
   row_id      C1     C2     C3      C4     C5      C6
                   
 1 1       1.37    1.30  -0.307  0.455   0.206  0.322 
 2 2      -0.565   2.29  -1.78   0.705  -0.361 -0.784 
 3 3       0.363  -1.39  -0.172  1.04    0.758  1.58  
 4 4       0.633  -0.279  1.21  -0.609  -0.727  0.643 
 5 5       0.404  -0.133  1.90   0.505  -1.37   0.0898
 6 6      -0.106   0.636 -0.430 -1.72    0.433  0.277 
 7 7       1.51   -0.284 -0.257 -0.784  -0.811  0.679 
 8 8      -0.0947 -2.66  -1.76  -0.851   1.44   0.0898
 9 9       2.02   -2.44   0.460 -2.41   -0.431 -2.99  
10 10     -0.0627  1.32  -0.640  0.0361  0.656  0.285

pivot_longer(): Reshape the data matrix to tidy dataframe

Now we have the data as a dataframe and ready to reshape using pivot_longer() function to a tidy long format. Here we specify where names should go to and values of the matrix should go to.

mat_df %>% 
  pivot_longer(-row_id,
               names_to="sample_id",
               values_to="vals") 

# A tibble: 60 × 3
   row_id sample_id   vals
           
 1 1      C1         1.37 
 2 1      C2         1.30 
 3 1      C3        -0.307
 4 1      C4         0.455
 5 1      C5         0.206
 6 1      C6         0.322
 7 2      C1        -0.565
 8 2      C2         2.29 
 9 2      C3        -1.78 
10 2      C4         0.705
# … with 50 more rows

We have now converted to a data matrix into a long tidy dataframe.