对 R 的 Stata 命令:带条件求和
Stata command to R: sum with conditions
有人知道如何将这个 stata 命令翻译成 R 命令吗?
按城市排序:egen float total_population = total (id)
例子
id city
1 a
1 a
1 a
2 r
2 r
3 r
6 h
7 h
8 h
9 h
10 h
预期结果
id city total _population
1 a 1
1 a 1
1 a 1
2 r 2
2 r 2
3 r 2
6 h 5
7 h 5
8 h 5
9 h 5
10 h 5
按'id'
分组后,我们需要n_distinct
('id'中不同元素的数量)
library(dplyr)
df1 <- df1 %>%
group_by(city) %>%
mutate(total_population = n_distinct(id)) %>%
ungroup
-输出
df1
# A tibble: 11 × 3
id city total_population
<int> <chr> <int>
1 1 a 1
2 1 a 1
3 1 a 1
4 2 r 2
5 2 r 2
6 3 r 2
7 6 h 5
8 7 h 5
9 8 h 5
10 9 h 5
11 10 h 5
在base R
中,这可以用ave
来完成
df1$total_population <- with(df1, ave(id, city,
FUN = function(x) length(unique(x))))
数据
df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 6L, 7L, 8L, 9L,
10L), city = c("a", "a", "a", "r", "r", "r", "h", "h", "h", "h",
"h")), class = "data.frame", row.names = c(NA, -11L))
有人知道如何将这个 stata 命令翻译成 R 命令吗?
按城市排序:egen float total_population = total (id)
例子
id city
1 a
1 a
1 a
2 r
2 r
3 r
6 h
7 h
8 h
9 h
10 h
预期结果
id city total _population
1 a 1
1 a 1
1 a 1
2 r 2
2 r 2
3 r 2
6 h 5
7 h 5
8 h 5
9 h 5
10 h 5
按'id'
分组后,我们需要n_distinct
('id'中不同元素的数量)
library(dplyr)
df1 <- df1 %>%
group_by(city) %>%
mutate(total_population = n_distinct(id)) %>%
ungroup
-输出
df1
# A tibble: 11 × 3
id city total_population
<int> <chr> <int>
1 1 a 1
2 1 a 1
3 1 a 1
4 2 r 2
5 2 r 2
6 3 r 2
7 6 h 5
8 7 h 5
9 8 h 5
10 9 h 5
11 10 h 5
在base R
中,这可以用ave
df1$total_population <- with(df1, ave(id, city,
FUN = function(x) length(unique(x))))
数据
df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 6L, 7L, 8L, 9L,
10L), city = c("a", "a", "a", "r", "r", "r", "h", "h", "h", "h",
"h")), class = "data.frame", row.names = c(NA, -11L))