在 R 中使用 dplyr 时无法在数据 tbl 中获取每组总和

Cannot get sum per group in data tbl when using dplyr in R

我正在使用 dplyr 尝试根据 3 个组获取 6 个变量的均值,我也想获得每个单元格的计数(即,我想 add 每组变量对的一列计数)

我的代码是这样的:

bitul_reason_tbl <- bitul_reason_calc %>% group_by(segment_name) %>% summarize(Total_Count=n(),
                                                       better_insurance = mean(better_insurance),count1=sum(bitul_reason_calc$better_insurance),
                                                       blank = mean(blank), count2=sum(bitul_reason_calc$blank),
                                                       kefel = mean(kefel), count3=sum(bitul_reason_calc$kefel),
                                                       no_need = mean(no_need), count4=sum(bitul_reason_calc$no_need),
                                                       other = mean(other), count5=sum(bitul_reason_calc$other),
                                                       price = mean(price), count6=sum(bitul_reason_calc$price),
                                                       sherut = mean(sherut),count7=sum(bitul_reason_calc$sherut))

变量都是0或1,所以求和就像计数。但我得到的是每个变量重复 3 次的总和,而不是每组应该的总和。怎么了?

# A tibble: 3 x 14
        segment_name Total_Count      price count1      kefel count2     sherut count3   nothing count4      other count5     blank count6
              <fctr>       <int>      <dbl>  <dbl>      <dbl>  <dbl>      <dbl>  <dbl>     <dbl>  <dbl>      <dbl>  <dbl>     <dbl>  <dbl>
1         briut_siud         277 0.11552347     69 0.02527076     22 0.04693141     27 0.1227437    101 0.05776173     81 0.6498195    465
2 vetek_up_half_year         225 0.09333333     69 0.02666667     22 0.03111111     27 0.1288889    101 0.14222222     81 0.5866667    465
3             teunot         247 0.06477733     69 0.03643725     22 0.02834008     27 0.1538462    101 0.13360324     81 0.6194332    465
bitul_reason_tbl <- bitul_reason_calc %>% 
  group_by(segment_name) %>% 
  summarize(Total_Count=n(),
  better_insurance = mean(better_insurance),
  count1=sum(bitul_reason_calc$better_insurance),
  blank = mean(blank), count2=sum(bitul_reason_calc),
  kefel = mean(kefel), count3=sum(bitul_reason_calc),
  no_need = mean(no_need), count4=sum(bitul_reason_calc),
  other = mean(other), count5=sum(bitul_reason_calc),
  price = mean(price), count6=sum(bitul_reason_calc),
  sherut = mean(sherut),count7=sum(bitul_reason_calc))

同时使用 dplyr 和链接函数时,您只需要引用列名。

好的,所以对我有用的解决方案(奇怪地)是我切换了在 summarize() 中调用 sum() 和 mean() 的顺序。这很奇怪,但它奏效了。