Sum NA values in r

Question

Sum NA values in r

3.3k views Asked by A.J At 15 June 2015 at 15:36

I am using a dataframe that has multiple NA values so I was thinking about sorting the attributes based on their NA values. I was trying to use a for loop and this is what I have so far:

> data <- read.csv("C:/Users/Nikita/Desktop/first1k.csv")
> for (i in 1:length(data) ) {
+ temp <- c(sum(is.na(data[i])))}
> temp
[1] 0

It is the first time I am using a for loop in r so I am sure it is just a silly syntax problem but I can't understand which one exactly.

Ultimately, I need a list that shows the name of the attribute and its NA count. This way I could sort the list and get the desired information. Here is some mock data to make it easier.

data <- data.frame(A = c(500, 600, 700, 1000),
                   B = c(500, 600, 700, NA),
                   C = c(NA, NA, 500, 700),
                   D = c(800, NA, 933, NA),
                   E = c(NA, NA, NA, NA))

Edit: Thank you all for the help. All three solution worked for me. I do wonder though if there is a one line code that will sort those attributes before I export them into a file. like I mentioned before, I am quite new in r so I am not sure if it is possible.

Edit 2: When I run the sort is gives me the next error:

temp <- sort(temp)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
  'x' must be atomic

Any idea why?

Original Q&A

There are 3 answers

jeremycg On 15 June 2015 at 15:42

Here is a quick answer using is.na and colSums:

colSums(is.na(data))

returning:

 A B C D E 
 0 1 2 2 4

for your above data.

Thanks to @akrun for showing my surplus apply

blakeoft On 15 June 2015 at 15:53

This answer shows how to make the for loop work.

temp <- vector(length = ncol(data))

for (i in 1:length(data)) {
   temp[i] <- c(sum(is.na(data[, i])))
}

names(temp) <- colnames(data)

temp
# A B C D E 
# 0 1 2 2 4

**Eli Korvigo** · Accepted Answer · 2015-06-15T15:54:24+00:00

The right way to do iterative code in R is to avoid explicit for loops. Use apply (and the company) instead. @jeremycg gave you the right R-ish answer. Regarding your code, you should make some editing to make it work.

temp <- c()
for (i in 1:length(data)){
    temp[names(data)[i]] <- sum(is.na(data[i]))
}

You had temp rewritten at each iteration. Moreover you didn't write the labels of your variables into temp. Hence the output you see is the number of NAs in the last column of your dataset.

Regarding OP's edit

temp <- sort(temp) # pass decreasing=T into arguments in case
                   # you want reversed order

TechQA.

Sum NA values in r

There are 3 answers

Related Questions in R

Related Questions in SORTING

Related Questions in NA

Popular Questions

Trending Questions