How to remove columns with all NAs

apply()
dplyr select()
Published

October 14, 2022

In this tutorial, we will learn how to drop columns with values that are all NAs. We will use two approaches to remove columns with all NAs. First, we will use tidyverse approach, where we perform column-wise operation to see all values are NAs and select columns that are not all NAs. Next we will use base R approach by counting the number NAs per columns using apply() function and select columns that are not all NAs.

Remove columns with all NAs Remove columns with all NAs First let us load tidyverse meta package.

library(tidyverse)

Create a dataframe with a column of all NAs

To create a dataframe with missing values we use a vector with more missing values than non-missing values.

x    
1     4    NA    NA
2     3    NA    NA
3    NA    NA    NA
4    NA    NA     2
5    NA    NA    NA

Removing columns with all NAs with tidyverse

Using tidyverse approach we remove one or more columns with all NAs using select() function. Here instead of selecting columns by names, we select columns that are all NAs. We use an anonymous function to find if a column is all NAs.

df %>%
  select(where(function(x) any(!is.na(x))))

# A tibble: 5 × 2
     C1    C3
   
1     4    NA
2     3    NA
3    NA    NA
4    NA     2
5    NA    NA

In the above example we have one column with all NAs. Here is the second example where we remove multiple columns with all NAs

set.seed(2202)
x      
1     3    NA    NA    NA    NA
2    NA    NA    NA    NA     3
3    NA    NA    NA    NA    NA
4    NA    NA    NA    NA     1
5    NA    NA     1    NA    NA

Our dataframe has two columns with all NAs.

df2 %>%
  select(where(function(x) any(!is.na(x))))

# A tibble: 5 × 3
     C1    C3    C5
    
1     3    NA    NA
2    NA    NA     3
3    NA    NA    NA
4    NA    NA     1
5    NA     1    NA

Removing columns with all NAs use base R

To remove columns with all NAs using base R approach, we first compute the number of missing values per column using apply() function.

n_NAs  
1     4    NA
2     3    NA
3    NA    NA
4    NA     2
5    NA    NA

As before now we see an example of using base R approach to remove multiple columns with all NAs. In this example, we use the dataframe with two columns of all NAs and remove them both using base R approach.

n_NAs   
1     3    NA    NA
2    NA    NA     3
3    NA    NA    NA
4    NA    NA     1
5    NA     1    NA