How to convert a matrix to a tidy table

pivot_longer()
Learn how to convert a matrix to a tidy table with this comprehensive R tutorial. Includes practical examples and code snippets.
Published

December 1, 2022

Introduction

Converting matrices to tidy tables is essential for data analysis in R, as most tidyverse functions expect data in long format rather than wide matrix format. This transformation allows you to leverage powerful tools like ggplot2 and dplyr for visualization and manipulation.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

You have a correlation matrix showing relationships between variables, but you need it in tidy format to create visualizations or perform further analysis. Matrix format stores data in a grid, but tidy format requires each observation as a row.

Step 1: Create a sample matrix

We’ll start with a simple correlation matrix from the penguins dataset.

# Create correlation matrix
penguin_numeric <- penguins |> 
  select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) |> 
  na.omit()

cor_matrix <- cor(penguin_numeric)
print(cor_matrix)

This creates a 4x4 correlation matrix showing relationships between penguin physical measurements.

Step 2: Convert matrix to data frame

We need to preserve row names as a column before converting to tidy format.

# Convert to data frame with row names as column
cor_df <- cor_matrix |> 
  as.data.frame() |> 
  rownames_to_column(var = "variable1")

head(cor_df)

Now we have a data frame where row names become the first column, making the data easier to manipulate.

Step 3: Transform to tidy format

The pivot_longer function converts wide format to long format, creating one row per correlation pair.

# Convert to tidy format
tidy_correlations <- cor_df |> 
  pivot_longer(cols = -variable1, 
               names_to = "variable2", 
               values_to = "correlation")

print(tidy_correlations)

Each correlation coefficient now occupies its own row with clearly labeled variable pairs and correlation values.

Example 2: Practical Application

The Problem

You have a matrix of survey responses where rows represent respondents and columns represent questions, but you need to analyze response patterns across different demographics. The wide matrix format makes it difficult to group and summarize responses effectively.

Step 1: Create a realistic survey matrix

Let’s simulate survey data with respondents rating different aspects of penguin research.

# Create sample survey matrix
set.seed(123)
survey_matrix <- matrix(
  sample(1:5, 40, replace = TRUE),
  nrow = 8,
  ncol = 5,
  dimnames = list(paste0("Respondent_", 1:8),
                  c("Research_Quality", "Data_Collection", "Analysis", "Presentation", "Impact"))
)

This creates a matrix with 8 respondents rating 5 research aspects on a 1-5 scale.

Step 2: Add respondent information

Before tidying, we’ll add demographic information to make the analysis more meaningful.

# Convert to data frame and add demographics
survey_df <- survey_matrix |> 
  as.data.frame() |> 
  rownames_to_column(var = "respondent_id") |> 
  mutate(department = rep(c("Biology", "Statistics"), each = 4),
         experience = rep(c("Junior", "Senior"), times = 4))

Now each respondent has associated demographic information that will be preserved during the transformation.

Step 3: Transform to tidy format for analysis

Converting to long format enables grouping and statistical analysis across different dimensions.

# Convert to tidy format
tidy_survey <- survey_df |> 
  pivot_longer(cols = c(-respondent_id, -department, -experience),
               names_to = "research_aspect",
               values_to = "rating")

head(tidy_survey, 10)

Each rating is now a separate observation with complete respondent information, enabling complex analyses.

Step 4: Analyze the tidy data

With tidy data, we can easily calculate summary statistics and create visualizations.

# Calculate average ratings by department and aspect
summary_stats <- tidy_survey |> 
  group_by(department, research_aspect) |> 
  summarise(avg_rating = mean(rating),
            .groups = "drop")

print(summary_stats)

The tidy format allows for straightforward grouping operations and statistical summaries that would be cumbersome with matrix format.

Summary

  • Use as.data.frame() and rownames_to_column() to preserve row information when converting matrices
  • Apply pivot_longer() to transform wide matrix format into tidy long format with one observation per row
  • Tidy format enables powerful data manipulation with dplyr functions like group_by() and summarise()
  • Long format is essential for creating effective visualizations with ggplot2 and performing statistical analyses
  • Always preserve important metadata (like demographics) during the transformation process to maintain analytical capability