使用 dplyr 根据另一列计算出现次数
Count occurrences based on another column using dplyr
这是一个table:
origin fans
USA 67
UK 56
GERMANY 56
USA 55
UK 76
GERMANY 43
USA 51
GERMANY 48
这个数据框叫做music_fans
。如何根据每个国家/地区的粉丝总数添加一列,其中第三列如下所示:
origin fans total_fans
USA 67 173
UK 56 183
GERMANY 56 147
USA 55 173
UK 76 183
GERMANY 43 147
USA 51 173
UK 51 183
GERMANY 48 147
您可以通过 dplyr
获得组的总和:
library(dplyr)
music_fans %>%
group_by(origin) %>%
mutate(total_fans = sum(fans, na.rm = TRUE))
输出
origin fans total_fans
<chr> <int> <int>
1 USA 67 173
2 UK 56 183
3 GERMANY 56 147
4 USA 55 173
5 UK 76 183
6 GERMANY 43 147
7 USA 51 173
8 UK 51 183
9 GERMANY 48 147
或以 R 为基数:
music_fans$total_fans <- ave(music_fans$fans, music_fans$origin, FUN = sum, na.rm = T)
数据
music_fans <- structure(list(origin = c("USA", "UK", "GERMANY", "USA", "UK",
"GERMANY", "USA", "UK", "GERMANY"), fans = c(67L, 56L, 56L, 55L, 76L,
43L, 51L, 51L, 48L)), class = "data.frame", row.names = c(NA, -9L))
这里是数据table做法:
setDT(df)[, .(total_fans = sum(fans)), by = 'origin'] %>%
left_join(df, by = 'origin')
这是一个table:
origin fans
USA 67
UK 56
GERMANY 56
USA 55
UK 76
GERMANY 43
USA 51
GERMANY 48
这个数据框叫做music_fans
。如何根据每个国家/地区的粉丝总数添加一列,其中第三列如下所示:
origin fans total_fans
USA 67 173
UK 56 183
GERMANY 56 147
USA 55 173
UK 76 183
GERMANY 43 147
USA 51 173
UK 51 183
GERMANY 48 147
您可以通过 dplyr
获得组的总和:
library(dplyr)
music_fans %>%
group_by(origin) %>%
mutate(total_fans = sum(fans, na.rm = TRUE))
输出
origin fans total_fans
<chr> <int> <int>
1 USA 67 173
2 UK 56 183
3 GERMANY 56 147
4 USA 55 173
5 UK 76 183
6 GERMANY 43 147
7 USA 51 173
8 UK 51 183
9 GERMANY 48 147
或以 R 为基数:
music_fans$total_fans <- ave(music_fans$fans, music_fans$origin, FUN = sum, na.rm = T)
数据
music_fans <- structure(list(origin = c("USA", "UK", "GERMANY", "USA", "UK",
"GERMANY", "USA", "UK", "GERMANY"), fans = c(67L, 56L, 56L, 55L, 76L,
43L, 51L, 51L, 48L)), class = "data.frame", row.names = c(NA, -9L))
这里是数据table做法:
setDT(df)[, .(total_fans = sum(fans)), by = 'origin'] %>%
left_join(df, by = 'origin')