How to convert a matrix to a tidy table
Introduction
Converting matrices to tidy tables is essential for data analysis in R, as most tidyverse functions expect data in long format rather than wide matrix format. This transformation allows you to leverage powerful tools like ggplot2 and dplyr for visualization and manipulation.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
You have a correlation matrix showing relationships between variables, but you need it in tidy format to create visualizations or perform further analysis. Matrix format stores data in a grid, but tidy format requires each observation as a row.
Step 1: Create a sample matrix
We’ll start with a simple correlation matrix from the penguins dataset.
# Create correlation matrix
penguin_numeric <- penguins |>
select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) |>
na.omit()
cor_matrix <- cor(penguin_numeric)
print(cor_matrix)This creates a 4x4 correlation matrix showing relationships between penguin physical measurements.
Step 2: Convert matrix to data frame
We need to preserve row names as a column before converting to tidy format.
# Convert to data frame with row names as column
cor_df <- cor_matrix |>
as.data.frame() |>
rownames_to_column(var = "variable1")
head(cor_df)Now we have a data frame where row names become the first column, making the data easier to manipulate.
Step 3: Transform to tidy format
The pivot_longer function converts wide format to long format, creating one row per correlation pair.
# Convert to tidy format
tidy_correlations <- cor_df |>
pivot_longer(cols = -variable1,
names_to = "variable2",
values_to = "correlation")
print(tidy_correlations)Each correlation coefficient now occupies its own row with clearly labeled variable pairs and correlation values.
Example 2: Practical Application
The Problem
You have a matrix of survey responses where rows represent respondents and columns represent questions, but you need to analyze response patterns across different demographics. The wide matrix format makes it difficult to group and summarize responses effectively.
Step 1: Create a realistic survey matrix
Let’s simulate survey data with respondents rating different aspects of penguin research.
# Create sample survey matrix
set.seed(123)
survey_matrix <- matrix(
sample(1:5, 40, replace = TRUE),
nrow = 8,
ncol = 5,
dimnames = list(paste0("Respondent_", 1:8),
c("Research_Quality", "Data_Collection", "Analysis", "Presentation", "Impact"))
)This creates a matrix with 8 respondents rating 5 research aspects on a 1-5 scale.
Step 2: Add respondent information
Before tidying, we’ll add demographic information to make the analysis more meaningful.
# Convert to data frame and add demographics
survey_df <- survey_matrix |>
as.data.frame() |>
rownames_to_column(var = "respondent_id") |>
mutate(department = rep(c("Biology", "Statistics"), each = 4),
experience = rep(c("Junior", "Senior"), times = 4))Now each respondent has associated demographic information that will be preserved during the transformation.
Step 3: Transform to tidy format for analysis
Converting to long format enables grouping and statistical analysis across different dimensions.
# Convert to tidy format
tidy_survey <- survey_df |>
pivot_longer(cols = c(-respondent_id, -department, -experience),
names_to = "research_aspect",
values_to = "rating")
head(tidy_survey, 10)Each rating is now a separate observation with complete respondent information, enabling complex analyses.
Step 4: Analyze the tidy data
With tidy data, we can easily calculate summary statistics and create visualizations.
# Calculate average ratings by department and aspect
summary_stats <- tidy_survey |>
group_by(department, research_aspect) |>
summarise(avg_rating = mean(rating),
.groups = "drop")
print(summary_stats)The tidy format allows for straightforward grouping operations and statistical summaries that would be cumbersome with matrix format.
Summary
- Use
as.data.frame()andrownames_to_column()to preserve row information when converting matrices - Apply
pivot_longer()to transform wide matrix format into tidy long format with one observation per row
- Tidy format enables powerful data manipulation with dplyr functions like
group_by()andsummarise() - Long format is essential for creating effective visualizations with ggplot2 and performing statistical analyses
Always preserve important metadata (like demographics) during the transformation process to maintain analytical capability