使用 dplyr 根据另一列计算出现次数

Count occurrences based on another column using dplyr

这是一个table:

origin    fans
USA        67
UK         56
GERMANY    56
USA        55
UK         76
GERMANY    43
USA        51
GERMANY    48

这个数据框叫做music_fans。如何根据每个国家/地区的粉丝总数添加一列,其中第三列如下所示:

origin    fans  total_fans
USA        67   173
UK         56    183
GERMANY    56    147
USA        55    173
UK         76    183
GERMANY    43    147
USA        51    173
UK         51    183
GERMANY    48    147

您可以通过 dplyr 获得组的总和:

library(dplyr)

music_fans %>%
  group_by(origin) %>%
  mutate(total_fans = sum(fans, na.rm = TRUE))

输出

  origin   fans total_fans
  <chr>   <int>      <int>
1 USA        67        173
2 UK         56        183
3 GERMANY    56        147
4 USA        55        173
5 UK         76        183
6 GERMANY    43        147
7 USA        51        173
8 UK         51        183
9 GERMANY    48        147

或以 R 为基数:

music_fans$total_fans <- ave(music_fans$fans, music_fans$origin, FUN = sum, na.rm = T)

数据

music_fans <- structure(list(origin = c("USA", "UK", "GERMANY", "USA", "UK", 
"GERMANY", "USA", "UK", "GERMANY"), fans = c(67L, 56L, 56L, 55L, 76L, 
43L, 51L, 51L, 48L)), class = "data.frame", row.names = c(NA, -9L)) 

这里是数据table做法:

setDT(df)[, .(total_fans = sum(fans)), by = 'origin'] %>% 
 left_join(df, by = 'origin')