R boxplot 使用 apply() 从 boxplot() groupby 中保存离群值
R boxplot using apply() for saving outliers from boxplot() groupby
数据重塑或 apply() 函数发现并保存 boxplot() 函数中的异常值,同时按组标识符对数据进行分组。
我的第一次尝试是创建一个内部有 boxplot() 函数的函数来捕获异常值,例如。箱线图(...)$输出;然后 return $out(异常值)并将结果应用于 table df.events$outliers。
最终目标是 table 按组划分异常值,例如
e.g., OutliersByGroupTableName
group_id_name
outliers_from_boxplot
然后可以将使用日期事件范围的 select() 的箱线图 () 添加到新的字段列中,形成以下 table.
e.g., OutliersByGroupTableName
group_id_name
outliers_from_boxplot
time_range_outliers_from_boxplot
使用此代码,我的尝试是在函数内部创建 boxplot()。在 R 中使用应用来导航“组”和“排名”,使用数据框调用 FUN=test_func(df.events) 。这是我在使用 apply 转发到 boxplot() 函数和 table 字段旁边的 return 时遇到的问题(此代码视图中未显示)。
或者,apply() 是这项调查的最佳方法吗?
test_func <- function(df) {
boxplot(df$rank ~ df$group, data=df, plot=FALSE, )$out
}
apply(df.events, c("group","rank"), FUN=test_func(df.events))
数据(输出)
> dput(head(df.events, 50))
structure(list(rank = c(0.5, 0.5, 0.5, 0.5, 0, 1, 1, 1, 1, 0,
0, 0, 0.25, 0.25, 0, 2, 2, 2, 0, 0, 2, 2, 0, 1, 1, 0, 0, 0, 0,
0.25, 0.25, 0.6, 0.6, 0, 0, 3, 3, 0.5, 0.5, 0.5, 3, 3, 3, 1.5,
1, 1, 0, 1, 1, 0), group = c(751, 728, 753, 808, 909, 909, 920,
728, 686, 727, 1025, 727, 728, 808, 750, 752, 752, 782, 752,
686, 752, 808, 691, 920, 920, 727, 727, 782, 991, 727, 808,
686, 728, 1025, 686, 920, 986, 782, 736, 909, 686, 782, 751,
728, 782, 782, 909, 909, 686, 686), outliers = c("NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA")), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
>
如果我们需要动态传递排名列和组列的名称,然后将它们与数据集一起创建为参数,则可以使用 paste
创建公式并应用 boxplot
test_func <- function(df, colnm, grpcol){
boxplot(as.formula(paste0(colnm, ' ~ ', grpcol)), data = df, plot = FALSE)
}
然后我们可以申请
out <- test_func(df.events, 'rank', 'group')
str(out)
#List of 6
# $ stats: num [1:5, 1:16] 0 0 0.6 1 1 0 0 0 0 0 ...
# $ n : num [1:16] 7 1 5 5 1 1 2 4 1 6 ...
# $ conf : num [1:2, 1:16] 0.00282 1.19718 0 0 0 ...
# $ out : num [1:2] 3 0.25
# $ group: num [1:2] 1 3
# $ names: chr [1:16] "686" "691" "727" "728" ..
数据重塑或 apply() 函数发现并保存 boxplot() 函数中的异常值,同时按组标识符对数据进行分组。
我的第一次尝试是创建一个内部有 boxplot() 函数的函数来捕获异常值,例如。箱线图(...)$输出;然后 return $out(异常值)并将结果应用于 table df.events$outliers。 最终目标是 table 按组划分异常值,例如
e.g., OutliersByGroupTableName
group_id_name
outliers_from_boxplot
然后可以将使用日期事件范围的 select() 的箱线图 () 添加到新的字段列中,形成以下 table.
e.g., OutliersByGroupTableName
group_id_name
outliers_from_boxplot
time_range_outliers_from_boxplot
使用此代码,我的尝试是在函数内部创建 boxplot()。在 R 中使用应用来导航“组”和“排名”,使用数据框调用 FUN=test_func(df.events) 。这是我在使用 apply 转发到 boxplot() 函数和 table 字段旁边的 return 时遇到的问题(此代码视图中未显示)。 或者,apply() 是这项调查的最佳方法吗?
test_func <- function(df) {
boxplot(df$rank ~ df$group, data=df, plot=FALSE, )$out
}
apply(df.events, c("group","rank"), FUN=test_func(df.events))
数据(输出)
> dput(head(df.events, 50))
structure(list(rank = c(0.5, 0.5, 0.5, 0.5, 0, 1, 1, 1, 1, 0,
0, 0, 0.25, 0.25, 0, 2, 2, 2, 0, 0, 2, 2, 0, 1, 1, 0, 0, 0, 0,
0.25, 0.25, 0.6, 0.6, 0, 0, 3, 3, 0.5, 0.5, 0.5, 3, 3, 3, 1.5,
1, 1, 0, 1, 1, 0), group = c(751, 728, 753, 808, 909, 909, 920,
728, 686, 727, 1025, 727, 728, 808, 750, 752, 752, 782, 752,
686, 752, 808, 691, 920, 920, 727, 727, 782, 991, 727, 808,
686, 728, 1025, 686, 920, 986, 782, 736, 909, 686, 782, 751,
728, 782, 782, 909, 909, 686, 686), outliers = c("NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA")), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
>
如果我们需要动态传递排名列和组列的名称,然后将它们与数据集一起创建为参数,则可以使用 paste
创建公式并应用 boxplot
test_func <- function(df, colnm, grpcol){
boxplot(as.formula(paste0(colnm, ' ~ ', grpcol)), data = df, plot = FALSE)
}
然后我们可以申请
out <- test_func(df.events, 'rank', 'group')
str(out)
#List of 6
# $ stats: num [1:5, 1:16] 0 0 0.6 1 1 0 0 0 0 0 ...
# $ n : num [1:16] 7 1 5 5 1 1 2 4 1 6 ...
# $ conf : num [1:2, 1:16] 0.00282 1.19718 0 0 0 ...
# $ out : num [1:2] 3 0.25
# $ group: num [1:2] 1 3
# $ names: chr [1:16] "686" "691" "727" "728" ..