在 R 中使用 is.na 和 Sapply 函数

Question

谁能告诉我下面这行代码是做什么的？

sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100

理解的是，当它应用求和函数但将它们保留在矩阵中时，它将下降 NAs。

感谢任何帮助。

谢谢

Answer 1

评论够多了，是时候回答了：

sapply(X,      # apply to each item of X (each column, if X is a data frame)
  function(x)  # this function:
    sum(is.na(x))  # count the NAs
) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object
  # and multiply by 100

换句话说就是统计X的每一列缺失值的个数，然后用结果除以airports的行数再乘以100。计算缺失百分比每列中的值，假设 X 具有与 airports 相同的行数。

将 X 的列与 nrow(airports) 的列混合搭配很奇怪，我希望它们是相同的（即 sapply(airports, ...) / nrow(airports) 或 sapply(X, ...) / nrow(X).

正如我在评论中提到的，什么都没有 "dropped"。如果您想执行 sum 忽略 NA 值，您可以执行 sum(foo, na.rm = TRUE)。相反，在这里，*求和的是 is.na(x)，也就是说我们正在求和每个值是否缺失：计算缺失值。 sum(is.na(foo)) 是计算 foo.

中 NA 值数量的惯用方法

在这种情况下，目标是百分比而不是计数，我们可以使用 mean() 代替 sum() / n 来简化：

# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100

我们也可以对整个数据使用 is.na()，这样我们就不需要 "anonymous function":

# rearrange for more simplicity
sapply(is.na(airports), mean) * 100

在 R 中使用 is.na 和 Sapply 函数

Using is.na with Sapply function in R

r

lapply

sapply

na