对 R 的 Stata 命令:带条件求和

Stata command to R: sum with conditions

有人知道如何将这个 stata 命令翻译成 R 命令吗?

按城市排序:egen float total_population = total (id)

例子

id  city
1   a
1   a
1   a
2   r
2   r
3   r
6   h
7   h
8   h
9   h
10  h

预期结果

id  city    total _population
1   a   1
1   a   1
1   a   1
2   r   2
2   r   2
3   r   2
6   h   5
7   h   5
8   h   5
9   h   5
10  h   5

按'id'

分组后,我们需要n_distinct('id'中不同元素的数量)
library(dplyr)
df1 <- df1 %>% 
   group_by(city) %>% 
   mutate(total_population = n_distinct(id)) %>%
   ungroup

-输出

df1
# A tibble: 11 × 3
      id city  total_population
   <int> <chr>            <int>
 1     1 a                    1
 2     1 a                    1
 3     1 a                    1
 4     2 r                    2
 5     2 r                    2
 6     3 r                    2
 7     6 h                    5
 8     7 h                    5
 9     8 h                    5
10     9 h                    5
11    10 h                    5

base R中,这可以用ave

来完成
df1$total_population <- with(df1, ave(id, city,
     FUN = function(x) length(unique(x))))

数据

df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 6L, 7L, 8L, 9L, 
10L), city = c("a", "a", "a", "r", "r", "r", "h", "h", "h", "h", 
"h")), class = "data.frame", row.names = c(NA, -11L))