How to remove columns with all NAs
In this tutorial, we will learn how to drop columns with values that are all NAs. We will use two approaches to remove columns with all NAs. First, we will use tidyverse approach, where we perform column-wise operation to see all values are NAs and select columns that are not all NAs. Next we will use base R approach by counting the number NAs per columns using apply() function and select columns that are not all NAs.
Remove columns with all NAs First let us load tidyverse meta package.
library(tidyverse)Create a dataframe with a column of all NAs
To create a dataframe with missing values we use a vector with more missing values than non-missing values.
x
1 4 NA NA
2 3 NA NA
3 NA NA NA
4 NA NA 2
5 NA NA NARemoving columns with all NAs with tidyverse
Using tidyverse approach we remove one or more columns with all NAs using select() function. Here instead of selecting columns by names, we select columns that are all NAs. We use an anonymous function to find if a column is all NAs.
df %>%
select(where(function(x) any(!is.na(x))))
# A tibble: 5 × 2
C1 C3
1 4 NA
2 3 NA
3 NA NA
4 NA 2
5 NA NAIn the above example we have one column with all NAs. Here is the second example where we remove multiple columns with all NAs
set.seed(2202)
x
1 3 NA NA NA NA
2 NA NA NA NA 3
3 NA NA NA NA NA
4 NA NA NA NA 1
5 NA NA 1 NA NAOur dataframe has two columns with all NAs.
df2 %>%
select(where(function(x) any(!is.na(x))))
# A tibble: 5 × 3
C1 C3 C5
1 3 NA NA
2 NA NA 3
3 NA NA NA
4 NA NA 1
5 NA 1 NARemoving columns with all NAs use base R
To remove columns with all NAs using base R approach, we first compute the number of missing values per column using apply() function.
n_NAs
1 4 NA
2 3 NA
3 NA NA
4 NA 2
5 NA NAAs before now we see an example of using base R approach to remove multiple columns with all NAs. In this example, we use the dataframe with two columns of all NAs and remove them both using base R approach.
n_NAs
1 3 NA NA
2 NA NA 3
3 NA NA NA
4 NA NA 1
5 NA 1 NA