使用 na.omit 或 NA.RM 和 mapply 在 Datafrme 的多个列中忽略 R 中的 NA
Ignoring NA in R across multiple columns of Datafrme using na.omit or NA.RM and mapply
我有一个看起来像这样的数据框:
SampleNo Lab1 Lab2 Lab3 lab4 lab5 lab6 lab7 lab8 lab9 lab10
1 59.84 60.59 60.39 60.29 60.19 60.32 60.24 60.3 60.43 NA
2 59.78 60.19 60.16 60.23 60.32 60.46 60.53 60.2 60.40 59.6
3 59.86 60.17 60.22 60.28 60.18 60.42 60.21 60.0 60.44 NA
4 59.85 60.42 60.28 60.31 60.19 60.41 60.54 60.2 60.48 59.7
5 59.97 60.79 60.30 60.26 60.40 60.47 60.52 60.0 60.46 59.7
6 60.03 60.26 60.36 60.21 60.32 60.46 60.50 60.1 60.29 60.0
我想对数据框中每一列的平方求和,同时忽略 NA 值并分配给一个新向量。我可以获得适用于 1 列的代码,但我想使用 mapply
函数或类似函数同时获取所有列的值并分配给新向量。
我有以下单列代码
myvector <- sum(na.omit(df[,2] - mean(df[,2))^2))
这适用于 1 列
我已经为整个数据框尝试了以下操作
myvector <- (mapply(na.omit(sum(df[,2:11] - mean(df[,2:11]))^2)))
我收到错误提示“match.fun(FUN) 中的错误:c(”na.omit(sum(df[2:11] - mean(df[ is not函数、字符或符号", 2:11]))^2 不是函数字符或符号
和
myvector <- (mapply(sum(na.omit(df[,2:11] - mean(df[,2:11]))^2)))
但出现此错误:
Error in sum(na.omit, df[, 2:11] - mean(df[, :
invalid 'type' (closure) of argument
In addition: Warning message:
In mean.default(df[, 2:11]) :
argument is not numeric or logical: returning NA
我的想法是 na.omit 放错了地方,但我不知道它应该放在哪里。
如果您想按列执行操作,您可以使用 sapply
执行此操作。
sapply(df[-1], function(x) sum((x - mean(x, na.rm = TRUE))^2, na.rm = TRUE))
或使用 colSums
和 colMeans
与 sweep
:
colSums(sweep(df[-1], 2, colMeans(df[-1], na.rm = TRUE)) ^ 2, na.rm = TRUE)
# Lab1 Lab2 Lab3 lab4 lab5 lab6 lab7 lab8 lab9 lab10
# 0.04 0.31 0.04 0.01 0.04 0.02 0.12 0.07 0.02 0.09
请注意,您可以使用 na.rm = TRUE
忽略 NA
值。
数据
df <- structure(list(SampleNo = 1:6, Lab1 = c(59.84, 59.78, 59.86,
59.85, 59.97, 60.03), Lab2 = c(60.59, 60.19, 60.17, 60.42, 60.79,
60.26), Lab3 = c(60.39, 60.16, 60.22, 60.28, 60.3, 60.36), lab4 = c(60.29,
60.23, 60.28, 60.31, 60.26, 60.21), lab5 = c(60.19, 60.32, 60.18,
60.19, 60.4, 60.32), lab6 = c(60.32, 60.46, 60.42, 60.41, 60.47,
60.46), lab7 = c(60.24, 60.53, 60.21, 60.54, 60.52, 60.5), lab8 = c(60.3,
60.2, 60, 60.2, 60, 60.1), lab9 = c(60.43, 60.4, 60.44, 60.48,
60.46, 60.29), lab10 = c(NA, 59.6, NA, 59.7, 59.7, 60)),
class = "data.frame", row.names = c(NA, -6L))
您可以转置数据,减去列均值,然后计算平方和。
rowSums((t(df[-1]) - colMeans(df[-1], na.rm = TRUE))^2, na.rm = TRUE)
此外,可以将每列的样本方差与non-missing值各自的长度相乘减1得到平方差之和。
sapply(df[-1], var, na.rm = TRUE) * (colSums(!is.na(df[-1])) - 1)
# Lab1 Lab2 Lab3 lab4 lab5 lab6 lab7 lab8 lab9 lab10
# 0.04 0.31 0.04 0.01 0.04 0.02 0.12 0.07 0.02 0.09
公式:
我有一个看起来像这样的数据框:
SampleNo Lab1 Lab2 Lab3 lab4 lab5 lab6 lab7 lab8 lab9 lab10
1 59.84 60.59 60.39 60.29 60.19 60.32 60.24 60.3 60.43 NA
2 59.78 60.19 60.16 60.23 60.32 60.46 60.53 60.2 60.40 59.6
3 59.86 60.17 60.22 60.28 60.18 60.42 60.21 60.0 60.44 NA
4 59.85 60.42 60.28 60.31 60.19 60.41 60.54 60.2 60.48 59.7
5 59.97 60.79 60.30 60.26 60.40 60.47 60.52 60.0 60.46 59.7
6 60.03 60.26 60.36 60.21 60.32 60.46 60.50 60.1 60.29 60.0
我想对数据框中每一列的平方求和,同时忽略 NA 值并分配给一个新向量。我可以获得适用于 1 列的代码,但我想使用 mapply
函数或类似函数同时获取所有列的值并分配给新向量。
我有以下单列代码
myvector <- sum(na.omit(df[,2] - mean(df[,2))^2))
这适用于 1 列
我已经为整个数据框尝试了以下操作
myvector <- (mapply(na.omit(sum(df[,2:11] - mean(df[,2:11]))^2)))
我收到错误提示“match.fun(FUN) 中的错误:c(”na.omit(sum(df[2:11] - mean(df[ is not函数、字符或符号", 2:11]))^2 不是函数字符或符号
和
myvector <- (mapply(sum(na.omit(df[,2:11] - mean(df[,2:11]))^2)))
但出现此错误:
Error in sum(na.omit, df[, 2:11] - mean(df[, : invalid 'type' (closure) of argument In addition: Warning message: In mean.default(df[, 2:11]) : argument is not numeric or logical: returning NA
我的想法是 na.omit 放错了地方,但我不知道它应该放在哪里。
如果您想按列执行操作,您可以使用 sapply
执行此操作。
sapply(df[-1], function(x) sum((x - mean(x, na.rm = TRUE))^2, na.rm = TRUE))
或使用 colSums
和 colMeans
与 sweep
:
colSums(sweep(df[-1], 2, colMeans(df[-1], na.rm = TRUE)) ^ 2, na.rm = TRUE)
# Lab1 Lab2 Lab3 lab4 lab5 lab6 lab7 lab8 lab9 lab10
# 0.04 0.31 0.04 0.01 0.04 0.02 0.12 0.07 0.02 0.09
请注意,您可以使用 na.rm = TRUE
忽略 NA
值。
数据
df <- structure(list(SampleNo = 1:6, Lab1 = c(59.84, 59.78, 59.86,
59.85, 59.97, 60.03), Lab2 = c(60.59, 60.19, 60.17, 60.42, 60.79,
60.26), Lab3 = c(60.39, 60.16, 60.22, 60.28, 60.3, 60.36), lab4 = c(60.29,
60.23, 60.28, 60.31, 60.26, 60.21), lab5 = c(60.19, 60.32, 60.18,
60.19, 60.4, 60.32), lab6 = c(60.32, 60.46, 60.42, 60.41, 60.47,
60.46), lab7 = c(60.24, 60.53, 60.21, 60.54, 60.52, 60.5), lab8 = c(60.3,
60.2, 60, 60.2, 60, 60.1), lab9 = c(60.43, 60.4, 60.44, 60.48,
60.46, 60.29), lab10 = c(NA, 59.6, NA, 59.7, 59.7, 60)),
class = "data.frame", row.names = c(NA, -6L))
您可以转置数据,减去列均值,然后计算平方和。
rowSums((t(df[-1]) - colMeans(df[-1], na.rm = TRUE))^2, na.rm = TRUE)
此外,可以将每列的样本方差与non-missing值各自的长度相乘减1得到平方差之和。
sapply(df[-1], var, na.rm = TRUE) * (colSums(!is.na(df[-1])) - 1)
# Lab1 Lab2 Lab3 lab4 lab5 lab6 lab7 lab8 lab9 lab10
# 0.04 0.31 0.04 0.01 0.04 0.02 0.12 0.07 0.02 0.09
公式: