函数计算数据框中有多少观察值超出 R 中的特定值

Question

我在 R 中有一个包含数字列的数据框。我想查看数据框的每一列中有多少值超过了某个阈值。（例如标准值超过 +-2.5）这是我要显示的输出

假设我的数据框中的所有列都是数字，我可以使用什么函数或什么函数组合来产生类似的结果？

提前致谢:)

Answer 1

这很容易用 lapply 完成：

# Generate sample data (10 columns x 100 rows) normally distributed around 0
my.df <- as.data.frame(matrix(rnorm(n=1000), ncol=10))

# Get the line numbers, for each column in the df
lapply(my.df, function(x) which(abs(x) > 2.5))

# $V1
# integer(0)
# 
# $V2
# [1] 29 69
# 
# $V3
# [1] 85
# 
# $V4
# [1] 100
# 
# $V5
# [1] 11 40
# 
# $V6
# [1] 89
# 
# $V7
# [1] 67
# 
# $V8
# [1] 49 68
# 
# $V9
# integer(0)
# 
# $V10
# [1]  7 27

为了使格式接近您在问题中给出的格式，ExperimenteR 友善地建议：

library(data.table)
setDT(my.df)[, list(lapply(.SD, function(x) which(abs(x) > 2.5))), ]


 #        V1
 #  1:      
 #  2: 29,69
 #  3:    85
 #  4:   100
 #  5: 11,40
 #  6:    89
 #  7:    67
 #  8: 49,68
 #  9:      
 # 10:  7,27

要获取总数，对于 df 中的每一列，请使用

lapply(my.df, function(x) sum(abs(x) > 2.5))

# $V1
# [1] 0
# 
# $V2
# [1] 2
# 
# $V3
# [1] 1
# 
# $V4
# [1] 1
# 
# $V5
# [1] 2
# 
# $V6
# [1] 1
# 
# $V7
# [1] 1
# 
# $V8
# [1] 2
# 
# $V9
# [1] 0
# 
# $V10
# [1] 2

Answer 2

你也可以这样做：

library(reshape2); library(plyr)
#using data from @Dominic Comtois
my.df <- as.data.frame(matrix(rnorm(n=1000), ncol=10))

data = melt(my.df);
data2 = ddply(data,.(variable),summarise,length(value[(abs(value)>2.5)]))

函数计算数据框中有多少观察值超出 R 中的特定值

function to calculate how many observations in a data frame beyond a particular value in R

r

summarization