如何汇总两列数据满足某conditions/criterion的numbers/populations(使用"dplyr")?
How to summarize the numbers/populations that the data in two columns meet some conditions/criterion (by using "dplyr")?
这是样本数据和测试结果:
tta <- data.frame(v1=c(8, 6, 1, 3, 8, 3, 3, 4, 5, 5, 7, 3, 4, 2, 8, 2, 2, 2, 5, 8, 4, 5, 3, 5, 3),
v2=c(9, 5, 3, 5, 4, 4, 8, 3, 1, 3, 3, 7, 7, 7, 9, 3, 7, 3, 3, 8, 4, 6, 3, 7, 5),
group=c(rep(c(1:5), each=5)))
## not perfect and need downstream analysis or merge
resulta <- tta %>%
filter(v1<=6 & v2<=6) %>%
group_by(group) %>%
summarise(n=n(), frac=n/5)
## resulta
## lost the group 3 that has no data meet the criterion that "v1<=6 & v2<=6"
##
## # A tibble: 4 × 3
## group n frac
## <int> <int> <dbl>
## 1 1 3 0.6
## 2 2 4 0.8
## 3 4 3 0.6
## 4 5 4 0.8
## expect results
##
## # A tibble: 4 × 3
## group n frac
## <int> <int> <dbl>
## 1 1 3 0.6
## 2 2 4 0.8
## 3 3 0 0.0
## 4 4 3 0.6
## 5 5 4 0.8
##
有两个问题:
- 如果您先使用
filter
,则丢失没有数据满足条件(“v1<=6 & v2<=6”)的第 3 组。
frac=n/5
:分组数据不是5行或随机长度,人口计算不完善
有什么解决办法吗?除了 dplyr
之外的另一种方法也可以。感谢您的帮助
你可以试试,
tta %>%
mutate(key = as.numeric(v1<=6 & v2<=6)) %>%
group_by(group) %>%
summarize(n = sum(key), frac = n/n())
group n frac
<int> <dbl> <dbl>
1 1 3 0.6
2 2 4 0.8
3 3 0 0
4 4 3 0.6
5 5 4 0.8
这是样本数据和测试结果:
tta <- data.frame(v1=c(8, 6, 1, 3, 8, 3, 3, 4, 5, 5, 7, 3, 4, 2, 8, 2, 2, 2, 5, 8, 4, 5, 3, 5, 3),
v2=c(9, 5, 3, 5, 4, 4, 8, 3, 1, 3, 3, 7, 7, 7, 9, 3, 7, 3, 3, 8, 4, 6, 3, 7, 5),
group=c(rep(c(1:5), each=5)))
## not perfect and need downstream analysis or merge
resulta <- tta %>%
filter(v1<=6 & v2<=6) %>%
group_by(group) %>%
summarise(n=n(), frac=n/5)
## resulta
## lost the group 3 that has no data meet the criterion that "v1<=6 & v2<=6"
##
## # A tibble: 4 × 3
## group n frac
## <int> <int> <dbl>
## 1 1 3 0.6
## 2 2 4 0.8
## 3 4 3 0.6
## 4 5 4 0.8
## expect results
##
## # A tibble: 4 × 3
## group n frac
## <int> <int> <dbl>
## 1 1 3 0.6
## 2 2 4 0.8
## 3 3 0 0.0
## 4 4 3 0.6
## 5 5 4 0.8
##
有两个问题:
- 如果您先使用
filter
,则丢失没有数据满足条件(“v1<=6 & v2<=6”)的第 3 组。 frac=n/5
:分组数据不是5行或随机长度,人口计算不完善
有什么解决办法吗?除了 dplyr
之外的另一种方法也可以。感谢您的帮助
你可以试试,
tta %>%
mutate(key = as.numeric(v1<=6 & v2<=6)) %>%
group_by(group) %>%
summarize(n = sum(key), frac = n/n())
group n frac
<int> <dbl> <dbl>
1 1 3 0.6
2 2 4 0.8
3 3 0 0
4 4 3 0.6
5 5 4 0.8