由两个因子变量子集
Subset by two factors variables
考虑到两个因素(fac1
、fac2
)之间的相互作用,我想聚合我的数据集并为此应用一个函数。例如,考虑
给出的数据集
set.seed(1)
test <- data.frame(fac1 = sample(c("A", "B", "C"), 30, rep = T),
fac2 = sample(c("a", "b"), 30, rep = T),
value = runif(30))
对于 fac1 == "A"
和 "fac2 == a"
,我们有五个值,我想按分钟汇总。我用蛮力试过这种方法
min(test[test$fac1 == "A" & test$fac2 == "a", ]$value)
您提到 aggregate
,这将在这里起作用。
aggregate(test$value, test[,1:2], min)
fac1 fac2 x
1 A a 0.32535215
2 B a 0.14330438
3 C a 0.33239467
4 A b 0.33907294
5 B b 0.08424691
6 C b 0.24548851
这是一个tidyverse
备选方案
test %>% group_by(fac1, fac2) %>% summarise(x = min(value))
## A tibble: 6 x 3
## Groups: fac1 [?]
# fac1 fac2 x
# <fct> <fct> <dbl>
#1 A a 0.325
#2 A b 0.339
#3 B a 0.143
#4 B b 0.0842
#5 C a 0.332
#6 C b 0.245
考虑到两个因素(fac1
、fac2
)之间的相互作用,我想聚合我的数据集并为此应用一个函数。例如,考虑
set.seed(1)
test <- data.frame(fac1 = sample(c("A", "B", "C"), 30, rep = T),
fac2 = sample(c("a", "b"), 30, rep = T),
value = runif(30))
对于 fac1 == "A"
和 "fac2 == a"
,我们有五个值,我想按分钟汇总。我用蛮力试过这种方法
min(test[test$fac1 == "A" & test$fac2 == "a", ]$value)
您提到 aggregate
,这将在这里起作用。
aggregate(test$value, test[,1:2], min)
fac1 fac2 x
1 A a 0.32535215
2 B a 0.14330438
3 C a 0.33239467
4 A b 0.33907294
5 B b 0.08424691
6 C b 0.24548851
这是一个tidyverse
备选方案
test %>% group_by(fac1, fac2) %>% summarise(x = min(value))
## A tibble: 6 x 3
## Groups: fac1 [?]
# fac1 fac2 x
# <fct> <fct> <dbl>
#1 A a 0.325
#2 A b 0.339
#3 B a 0.143
#4 B b 0.0842
#5 C a 0.332
#6 C b 0.245