在 R 中使用 dplyr 时无法在数据 tbl 中获取每组总和
Cannot get sum per group in data tbl when using dplyr in R
我正在使用 dplyr 尝试根据 3 个组获取 6 个变量的均值,我也想获得每个单元格的计数(即,我想 add 每组变量对的一列计数)
我的代码是这样的:
bitul_reason_tbl <- bitul_reason_calc %>% group_by(segment_name) %>% summarize(Total_Count=n(),
better_insurance = mean(better_insurance),count1=sum(bitul_reason_calc$better_insurance),
blank = mean(blank), count2=sum(bitul_reason_calc$blank),
kefel = mean(kefel), count3=sum(bitul_reason_calc$kefel),
no_need = mean(no_need), count4=sum(bitul_reason_calc$no_need),
other = mean(other), count5=sum(bitul_reason_calc$other),
price = mean(price), count6=sum(bitul_reason_calc$price),
sherut = mean(sherut),count7=sum(bitul_reason_calc$sherut))
变量都是0或1,所以求和就像计数。但我得到的是每个变量重复 3 次的总和,而不是每组应该的总和。怎么了?
# A tibble: 3 x 14
segment_name Total_Count price count1 kefel count2 sherut count3 nothing count4 other count5 blank count6
<fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 briut_siud 277 0.11552347 69 0.02527076 22 0.04693141 27 0.1227437 101 0.05776173 81 0.6498195 465
2 vetek_up_half_year 225 0.09333333 69 0.02666667 22 0.03111111 27 0.1288889 101 0.14222222 81 0.5866667 465
3 teunot 247 0.06477733 69 0.03643725 22 0.02834008 27 0.1538462 101 0.13360324 81 0.6194332 465
bitul_reason_tbl <- bitul_reason_calc %>%
group_by(segment_name) %>%
summarize(Total_Count=n(),
better_insurance = mean(better_insurance),
count1=sum(bitul_reason_calc$better_insurance),
blank = mean(blank), count2=sum(bitul_reason_calc),
kefel = mean(kefel), count3=sum(bitul_reason_calc),
no_need = mean(no_need), count4=sum(bitul_reason_calc),
other = mean(other), count5=sum(bitul_reason_calc),
price = mean(price), count6=sum(bitul_reason_calc),
sherut = mean(sherut),count7=sum(bitul_reason_calc))
同时使用 dplyr 和链接函数时,您只需要引用列名。
好的,所以对我有用的解决方案(奇怪地)是我切换了在 summarize() 中调用 sum() 和 mean() 的顺序。这很奇怪,但它奏效了。
我正在使用 dplyr 尝试根据 3 个组获取 6 个变量的均值,我也想获得每个单元格的计数(即,我想 add 每组变量对的一列计数)
我的代码是这样的:
bitul_reason_tbl <- bitul_reason_calc %>% group_by(segment_name) %>% summarize(Total_Count=n(),
better_insurance = mean(better_insurance),count1=sum(bitul_reason_calc$better_insurance),
blank = mean(blank), count2=sum(bitul_reason_calc$blank),
kefel = mean(kefel), count3=sum(bitul_reason_calc$kefel),
no_need = mean(no_need), count4=sum(bitul_reason_calc$no_need),
other = mean(other), count5=sum(bitul_reason_calc$other),
price = mean(price), count6=sum(bitul_reason_calc$price),
sherut = mean(sherut),count7=sum(bitul_reason_calc$sherut))
变量都是0或1,所以求和就像计数。但我得到的是每个变量重复 3 次的总和,而不是每组应该的总和。怎么了?
# A tibble: 3 x 14
segment_name Total_Count price count1 kefel count2 sherut count3 nothing count4 other count5 blank count6
<fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 briut_siud 277 0.11552347 69 0.02527076 22 0.04693141 27 0.1227437 101 0.05776173 81 0.6498195 465
2 vetek_up_half_year 225 0.09333333 69 0.02666667 22 0.03111111 27 0.1288889 101 0.14222222 81 0.5866667 465
3 teunot 247 0.06477733 69 0.03643725 22 0.02834008 27 0.1538462 101 0.13360324 81 0.6194332 465
bitul_reason_tbl <- bitul_reason_calc %>%
group_by(segment_name) %>%
summarize(Total_Count=n(),
better_insurance = mean(better_insurance),
count1=sum(bitul_reason_calc$better_insurance),
blank = mean(blank), count2=sum(bitul_reason_calc),
kefel = mean(kefel), count3=sum(bitul_reason_calc),
no_need = mean(no_need), count4=sum(bitul_reason_calc),
other = mean(other), count5=sum(bitul_reason_calc),
price = mean(price), count6=sum(bitul_reason_calc),
sherut = mean(sherut),count7=sum(bitul_reason_calc))
同时使用 dplyr 和链接函数时,您只需要引用列名。
好的,所以对我有用的解决方案(奇怪地)是我切换了在 summarize() 中调用 sum() 和 mean() 的顺序。这很奇怪,但它奏效了。