Filter R Dataframe with atleast N number of non-NAs

In this tutorial, we will learn how to filter rows of a dataframe with alteast N number of non-NA column values.

To filter rows of a dataframe that has atleast N non-NAs, use dataframe subsetting as shown below

resultDF = mydataframe[rowSums(is.na(mydataframe[ , 0:ncol(mydataframe)])) <= (ncol(mydataframe) - N), ]

where

  • mydataframe is the dataframe containing rows with one or more NAs
  • resultDF is the resulting dataframe with rows not containing atleast one NA

Example 1 – Filter R Dataframe with minimum N non-NAs

In this example, we will create a Dataframe containing rows with different number of NAs.

> mydataframe = data.frame(x = c(9, NA, 7, 4), y = c(4, NA, NA, 21), z = c(9, 8, NA, 74), p = c(NA, 63, NA, 2))
> mydataframe
   x  y  z  p
1  9  4  9 NA
2 NA NA  8 63
3  7 NA NA NA
4  4 21 74  2

Now, we will filter this dataframe such that the output contains only rows with atleast 2 non-NAs.

> N = 2
> resultDF = mydataframe[rowSums(is.na(mydataframe[ , 0:ncol(mydataframe)])) <= (ncol(mydataframe) - N), ]
> resultDF
   x  y  z  p
1  9  4  9 NA
2 NA NA  8 63
4  4 21 74  2
>

Let us try with N = 3.

> N=3
> resultDF = mydataframe[rowSums(is.na(mydataframe[ , 0:ncol(mydataframe)])) <= (ncol(mydataframe) - N), ]
> resultDF
  x  y  z  p
1 9  4  9 NA
4 4 21 74  2
>
ADVERTISEMENT

Conclusion

In this R Tutorial, we have learned to filter a Dataframe based on the number of non-NAs (or ofcourse NAs) in a row.