用 sd+mean 估算 r 中的离群值
Imputing outliers in r with sd+mean
我想找到离均值 3sd 的离群值。我可以使用以下功能来做到这一点。我想在函数中添加一个替换函数。我想用 mean+3sd+(participants value-mean)/mean) 替换异常值。在这种情况下我应该使用 for 循环吗?下面给出了我尝试编写的循环示例。函数和for循环如何合并?或者在替换异常值时是否有任何其他方法可以遍历每一行数据(参与者值)?
最后,我希望有一个新列作为函数的结果。如果所有这些都可以通过 dplyr mutate 或其他功能实现,我愿意接受任何解决方案。
findingoutlier<- function (data, cutoff=3, na.rm=TRUE){
sd <- sd(data, na.rm=TRUE)
mean <- mean(data, na.rm=TRUE
outliers <- (data[data < mean - cutoff * sd | data > mean + cutoff * sd])
return (outliers)
}
for (i in data) {
x<- mean+3sd+(i-mean)/mean
replace(data, outliers, x)
}
# example data
bmi <- c(32.8999, 31.7826, 28.5573, 20.6350, 21.6311, NA, 29.6174, 52.7027, 58.5968, 30.1867, 28.7927, 26.4697, 42.0294, 27.1309, 56.3672, 62.6474, 34.1692, 31.5120, 29.8553, 34.4443, 25.4049, 25.7287, 71.3209, 23.5615, 19.9359,21.7438, 51.9286, 22.1875, NA, 24.4389, 28.1571, 23.7093, 47.5551, 27.7767, 30.3237, NA, 20.7838, 34.1878, 25.1559, 25.8645, 24.9673, 27.5374, 28.5467, 25.0402, 22.1056, 28.0026, 26.7901, 21.5110,NA, 50.7599, NA, 32.6979, 26.5295, 25.5246, 23.9657, 20.1323, 28.0452)
eid <- c(1:57)
df <- cbind(eid, bmi)
df
技巧在于您不仅可以将索引子集用作右侧值(要读取的内容),还可以将其用作左侧值(要写入的内容),如下所示:
m <- mean(data, na.rm=TRUE)
s <- sd(data, na.rm=TRUE)
# get the *indices* of the outliers
indices <- (abs(m - data) > 3*s) | is.na(data)
# compute the replacement for *every* value
replacement <- (data + m) / m + 3*s
# replace *only* the outliers
data[indices] <- replacement[indices]
我想找到离均值 3sd 的离群值。我可以使用以下功能来做到这一点。我想在函数中添加一个替换函数。我想用 mean+3sd+(participants value-mean)/mean) 替换异常值。在这种情况下我应该使用 for 循环吗?下面给出了我尝试编写的循环示例。函数和for循环如何合并?或者在替换异常值时是否有任何其他方法可以遍历每一行数据(参与者值)? 最后,我希望有一个新列作为函数的结果。如果所有这些都可以通过 dplyr mutate 或其他功能实现,我愿意接受任何解决方案。
findingoutlier<- function (data, cutoff=3, na.rm=TRUE){
sd <- sd(data, na.rm=TRUE)
mean <- mean(data, na.rm=TRUE
outliers <- (data[data < mean - cutoff * sd | data > mean + cutoff * sd])
return (outliers)
}
for (i in data) {
x<- mean+3sd+(i-mean)/mean
replace(data, outliers, x)
}
# example data
bmi <- c(32.8999, 31.7826, 28.5573, 20.6350, 21.6311, NA, 29.6174, 52.7027, 58.5968, 30.1867, 28.7927, 26.4697, 42.0294, 27.1309, 56.3672, 62.6474, 34.1692, 31.5120, 29.8553, 34.4443, 25.4049, 25.7287, 71.3209, 23.5615, 19.9359,21.7438, 51.9286, 22.1875, NA, 24.4389, 28.1571, 23.7093, 47.5551, 27.7767, 30.3237, NA, 20.7838, 34.1878, 25.1559, 25.8645, 24.9673, 27.5374, 28.5467, 25.0402, 22.1056, 28.0026, 26.7901, 21.5110,NA, 50.7599, NA, 32.6979, 26.5295, 25.5246, 23.9657, 20.1323, 28.0452)
eid <- c(1:57)
df <- cbind(eid, bmi)
df
技巧在于您不仅可以将索引子集用作右侧值(要读取的内容),还可以将其用作左侧值(要写入的内容),如下所示:
m <- mean(data, na.rm=TRUE)
s <- sd(data, na.rm=TRUE)
# get the *indices* of the outliers
indices <- (abs(m - data) > 3*s) | is.na(data)
# compute the replacement for *every* value
replacement <- (data + m) / m + 3*s
# replace *only* the outliers
data[indices] <- replacement[indices]