基于 R 中的多个变量创建人口普查 table
Creating a census table based on multiple variables in R
我是 R 的新手,并且真的在努力解决一个看似简单的问题(我一直无法找到答案)。
我有一个比较大的数据table主要包括
-人们
-他们居住的地方
-他们做什么
-入住日期
-搬出日期。
我的目标是得出一个 运行 每周人口普查 table ,每周作为一行,每个职业和城市有一列,填充当时的人数。
#MRE
library(tidyverse)
library(lubridate)
data <- data.frame(
first_names = c("joe", "sally", "bob", "frank", "susy"),
move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))
#what I've tried :
cities = unique(data$city)[!is.na(unique(data$city))]
occupations = unique(data$occupation)[!is.na(unique(data$occupation))]
weeks <- (date = seq(from = as.Date("2020-12-27"), to = as.Date(today()), by="1 week"))
census <- matrix(data=NA, nrows=44, ncols=12)
for (i in seq(cities)){
for (j in seq(occupations)){
count <- data %>%
filter(cities == i) %>%
filter(occupations == j) %>%
sapply(weeks, function(x)
sum(
((as.Date(data$move_in)) <= as.Date(x) &
(as.Date(data$move_out)) > as.Date(x))|
((as.Date(data$move_in)) <= as.Date(x) &
is.na(data$move_out))))
census[j,x] <- count
}}
非常感谢任何帮助!
这是一个可能的解决方案,使用一些 tidyverse 动词,因为你加载了那个包。我们会在您有兴趣使用 map_dfr
函数的几周内循环,并且每周我们都会收集一部分使用您上面的逻辑语句的人。然后,我们可以使用 group_by
跳过双外循环,直接 count
它们。最后,我们 mutate
一周的新专栏,让它们在绑定在一起后保持笔直。在循环之外,我们然后 pivot_wider
获得您正在寻找的每职业一列和每周一行的格式。
library(tidyverse)
data <- data.frame(
first_names = c("joe", "sally", "bob", "frank", "susy"),
move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))
# Avoid needing to load lubridate by using Sys.Date() instead of today()
weeks <- (date = seq(from = as.Date("2020-12-27"), to = as.Date(Sys.Date()), by="1 week"))
map_dfr(weeks, function(week_i){
data %>%
filter(move_in<week_i & move_out > week_i | move_in < week_i & is.na(move_out)) %>%
group_by(city, occupation) %>%
count() %>%
mutate(week=week_i)
}) %>%
pivot_wider(values_from = n, names_from = occupation, values_fill = 0)
哪个returns
# A tibble: 170 x 5
# Groups: city [4]
city week architect doctor teacher
<chr> <date> <int> <int> <int>
1 Austin 2020-12-27 1 0 0
2 Denver 2020-12-27 0 1 1
3 Seattle 2020-12-27 0 0 1
4 Austin 2021-01-03 1 0 0
5 Denver 2021-01-03 0 0 1
6 Seattle 2021-01-03 0 0 1
7 Austin 2021-01-10 1 0 0
8 Denver 2021-01-10 0 0 1
9 Phoenix 2021-01-10 0 1 0
10 Seattle 2021-01-10 0 0 1
# ... with 160 more rows
由于一些拼写错误,您似乎遇到了错误。您正在使用 filter
动词请求 cities
列,但数据在示例数据集中只有一个 city
列。 occupations
与 occupation
相同。很好地记住未来,但伟大的第一次努力和很好的提供的例子!
我用了data.table。 lubridate
不需要,我用了 Sys.Date().
我也将人口普查设为 data.table,而不是矩阵。
data.table::CJ 与 expand.grid.
几乎相同
然后使用 mapply 而不是 for 循环。
最后,从长到宽重新组织,我想这就是你想要的。
我保留了所有 city_occupation 组合 - 不确定这是否是我的意图。
library(data.table)
library(magrittr)
data <- data.frame(
first_names = c("joe", "sally", "bob", "frank", "susy"),
move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))
cities <- unique(data$city)[!is.na(unique(data$city))]
occupations <- unique(data$occupation)[!is.na(unique(data$occupation))]
weeks <- (date = seq(from = as.Date("2020-12-27"), to = Sys.Date(), by="1 week"))
data %>% setDT()
census <- CJ(week = weeks, city = cities, occupation = occupations) %>%
.[, count := mapply(function(wk, cty, occ) {
data[city == cty & occupation == occ,
sum(move_in <= wk & (move_out > wk | is.na(move_out)))]
}, week, city, occupation)]
census %<>% dcast(week ~ city + occupation, value.var = 'count')
给出:
census
week Austin_architect Austin_doctor Austin_teacher Denver_architect
1: 2020-12-27 1 0 0 0
2: 2021-01-03 1 0 0 0
3: 2021-01-10 1 0 0 0
4: 2021-01-17 1 0 0 0
5: 2021-01-24 1 0 0 0
6: 2021-01-31 1 0 0 0
7: 2021-02-07 1 0 0 0
8: 2021-02-14 1 0 0 0
9: 2021-02-21 1 0 0 0
10: 2021-02-28 1 0 0 0
11: 2021-03-07 1 0 0 0
12: 2021-03-14 1 0 0 0
13: 2021-03-21 1 0 0 0
14: 2021-03-28 1 0 0 0
15: 2021-04-04 1 0 0 0
16: 2021-04-11 1 0 0 0
17: 2021-04-18 1 0 0 0
18: 2021-04-25 1 0 0 0
19: 2021-05-02 1 0 0 0
20: 2021-05-09 1 0 0 0
21: 2021-05-16 1 0 0 0
22: 2021-05-23 1 0 0 0
23: 2021-05-30 1 0 0 0
24: 2021-06-06 1 0 0 0
25: 2021-06-13 1 0 0 0
26: 2021-06-20 1 0 0 0
27: 2021-06-27 1 0 0 0
28: 2021-07-04 1 0 0 0
29: 2021-07-11 1 0 0 0
30: 2021-07-18 1 0 0 0
31: 2021-07-25 1 0 0 0
32: 2021-08-01 1 0 0 0
33: 2021-08-08 1 0 0 0
34: 2021-08-15 1 0 0 0
35: 2021-08-22 1 0 0 0
36: 2021-08-29 1 0 0 0
37: 2021-09-05 1 0 0 0
38: 2021-09-12 1 0 0 0
39: 2021-09-19 1 0 0 0
40: 2021-09-26 1 0 0 0
41: 2021-10-03 0 0 0 0
42: 2021-10-10 0 0 0 0
43: 2021-10-17 0 0 0 0
44: 2021-10-24 0 0 0 0
week Austin_architect Austin_doctor Austin_teacher Denver_architect
Denver_doctor Denver_teacher Phoenix_architect Phoenix_doctor
1: 1 1 0 0
2: 0 1 0 0
3: 0 1 0 1
4: 0 1 0 1
5: 0 1 0 1
6: 0 1 0 1
7: 0 1 0 1
8: 0 1 0 1
9: 0 1 0 1
10: 0 1 0 1
11: 0 1 0 1
12: 0 1 0 1
13: 0 1 0 1
14: 0 1 0 1
15: 0 1 0 1
16: 0 1 0 1
17: 0 1 0 1
18: 0 1 0 1
19: 0 1 0 1
20: 0 1 0 1
21: 0 1 0 1
22: 0 1 0 1
23: 0 1 0 1
24: 0 1 0 1
25: 0 1 0 1
26: 0 1 0 1
27: 0 1 0 1
28: 0 1 0 1
29: 0 1 0 1
30: 0 1 0 1
31: 0 1 0 1
32: 0 1 0 1
33: 0 1 0 1
34: 0 1 0 1
35: 0 1 0 1
36: 0 1 0 1
37: 0 1 0 1
38: 0 1 0 1
39: 0 1 0 1
40: 0 1 0 1
41: 0 1 0 1
42: 0 1 0 1
43: 0 1 0 1
44: 0 1 0 1
Denver_doctor Denver_teacher Phoenix_architect Phoenix_doctor
Phoenix_teacher Seattle_architect Seattle_doctor Seattle_teacher
1: 0 0 0 1
2: 0 0 0 1
3: 0 0 0 1
4: 0 0 0 1
5: 0 0 0 1
6: 0 0 0 1
7: 0 0 0 1
8: 0 0 0 1
9: 0 0 0 1
10: 0 0 0 1
11: 0 0 0 1
12: 0 0 0 1
13: 0 0 0 1
14: 0 0 0 1
15: 0 0 0 1
16: 0 0 0 1
17: 0 0 0 1
18: 0 0 0 1
19: 0 0 0 1
20: 0 0 0 1
21: 0 0 0 1
22: 0 0 0 1
23: 0 0 0 1
24: 0 0 0 1
25: 0 0 0 1
26: 0 0 0 1
27: 0 0 0 1
28: 0 0 0 1
29: 0 0 0 1
30: 0 0 0 1
31: 0 0 0 1
32: 0 0 0 1
33: 0 0 0 1
34: 0 0 0 1
35: 0 0 0 1
36: 0 0 0 1
37: 0 0 0 1
38: 0 0 0 1
39: 0 0 0 1
40: 0 0 0 1
41: 0 0 0 1
42: 0 0 0 1
43: 0 0 0 1
44: 0 0 0 1
Phoenix_teacher Seattle_architect Seattle_doctor Seattle_teacher
我是 R 的新手,并且真的在努力解决一个看似简单的问题(我一直无法找到答案)。
我有一个比较大的数据table主要包括 -人们 -他们居住的地方 -他们做什么 -入住日期 -搬出日期。 我的目标是得出一个 运行 每周人口普查 table ,每周作为一行,每个职业和城市有一列,填充当时的人数。
#MRE
library(tidyverse)
library(lubridate)
data <- data.frame(
first_names = c("joe", "sally", "bob", "frank", "susy"),
move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))
#what I've tried :
cities = unique(data$city)[!is.na(unique(data$city))]
occupations = unique(data$occupation)[!is.na(unique(data$occupation))]
weeks <- (date = seq(from = as.Date("2020-12-27"), to = as.Date(today()), by="1 week"))
census <- matrix(data=NA, nrows=44, ncols=12)
for (i in seq(cities)){
for (j in seq(occupations)){
count <- data %>%
filter(cities == i) %>%
filter(occupations == j) %>%
sapply(weeks, function(x)
sum(
((as.Date(data$move_in)) <= as.Date(x) &
(as.Date(data$move_out)) > as.Date(x))|
((as.Date(data$move_in)) <= as.Date(x) &
is.na(data$move_out))))
census[j,x] <- count
}}
非常感谢任何帮助!
这是一个可能的解决方案,使用一些 tidyverse 动词,因为你加载了那个包。我们会在您有兴趣使用 map_dfr
函数的几周内循环,并且每周我们都会收集一部分使用您上面的逻辑语句的人。然后,我们可以使用 group_by
跳过双外循环,直接 count
它们。最后,我们 mutate
一周的新专栏,让它们在绑定在一起后保持笔直。在循环之外,我们然后 pivot_wider
获得您正在寻找的每职业一列和每周一行的格式。
library(tidyverse)
data <- data.frame(
first_names = c("joe", "sally", "bob", "frank", "susy"),
move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))
# Avoid needing to load lubridate by using Sys.Date() instead of today()
weeks <- (date = seq(from = as.Date("2020-12-27"), to = as.Date(Sys.Date()), by="1 week"))
map_dfr(weeks, function(week_i){
data %>%
filter(move_in<week_i & move_out > week_i | move_in < week_i & is.na(move_out)) %>%
group_by(city, occupation) %>%
count() %>%
mutate(week=week_i)
}) %>%
pivot_wider(values_from = n, names_from = occupation, values_fill = 0)
哪个returns
# A tibble: 170 x 5
# Groups: city [4]
city week architect doctor teacher
<chr> <date> <int> <int> <int>
1 Austin 2020-12-27 1 0 0
2 Denver 2020-12-27 0 1 1
3 Seattle 2020-12-27 0 0 1
4 Austin 2021-01-03 1 0 0
5 Denver 2021-01-03 0 0 1
6 Seattle 2021-01-03 0 0 1
7 Austin 2021-01-10 1 0 0
8 Denver 2021-01-10 0 0 1
9 Phoenix 2021-01-10 0 1 0
10 Seattle 2021-01-10 0 0 1
# ... with 160 more rows
由于一些拼写错误,您似乎遇到了错误。您正在使用 filter
动词请求 cities
列,但数据在示例数据集中只有一个 city
列。 occupations
与 occupation
相同。很好地记住未来,但伟大的第一次努力和很好的提供的例子!
我用了data.table。 lubridate
不需要,我用了 Sys.Date().
我也将人口普查设为 data.table,而不是矩阵。
data.table::CJ 与 expand.grid.
几乎相同
然后使用 mapply 而不是 for 循环。
最后,从长到宽重新组织,我想这就是你想要的。
我保留了所有 city_occupation 组合 - 不确定这是否是我的意图。
library(data.table)
library(magrittr)
data <- data.frame(
first_names = c("joe", "sally", "bob", "frank", "susy"),
move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))
cities <- unique(data$city)[!is.na(unique(data$city))]
occupations <- unique(data$occupation)[!is.na(unique(data$occupation))]
weeks <- (date = seq(from = as.Date("2020-12-27"), to = Sys.Date(), by="1 week"))
data %>% setDT()
census <- CJ(week = weeks, city = cities, occupation = occupations) %>%
.[, count := mapply(function(wk, cty, occ) {
data[city == cty & occupation == occ,
sum(move_in <= wk & (move_out > wk | is.na(move_out)))]
}, week, city, occupation)]
census %<>% dcast(week ~ city + occupation, value.var = 'count')
给出:
census
week Austin_architect Austin_doctor Austin_teacher Denver_architect
1: 2020-12-27 1 0 0 0
2: 2021-01-03 1 0 0 0
3: 2021-01-10 1 0 0 0
4: 2021-01-17 1 0 0 0
5: 2021-01-24 1 0 0 0
6: 2021-01-31 1 0 0 0
7: 2021-02-07 1 0 0 0
8: 2021-02-14 1 0 0 0
9: 2021-02-21 1 0 0 0
10: 2021-02-28 1 0 0 0
11: 2021-03-07 1 0 0 0
12: 2021-03-14 1 0 0 0
13: 2021-03-21 1 0 0 0
14: 2021-03-28 1 0 0 0
15: 2021-04-04 1 0 0 0
16: 2021-04-11 1 0 0 0
17: 2021-04-18 1 0 0 0
18: 2021-04-25 1 0 0 0
19: 2021-05-02 1 0 0 0
20: 2021-05-09 1 0 0 0
21: 2021-05-16 1 0 0 0
22: 2021-05-23 1 0 0 0
23: 2021-05-30 1 0 0 0
24: 2021-06-06 1 0 0 0
25: 2021-06-13 1 0 0 0
26: 2021-06-20 1 0 0 0
27: 2021-06-27 1 0 0 0
28: 2021-07-04 1 0 0 0
29: 2021-07-11 1 0 0 0
30: 2021-07-18 1 0 0 0
31: 2021-07-25 1 0 0 0
32: 2021-08-01 1 0 0 0
33: 2021-08-08 1 0 0 0
34: 2021-08-15 1 0 0 0
35: 2021-08-22 1 0 0 0
36: 2021-08-29 1 0 0 0
37: 2021-09-05 1 0 0 0
38: 2021-09-12 1 0 0 0
39: 2021-09-19 1 0 0 0
40: 2021-09-26 1 0 0 0
41: 2021-10-03 0 0 0 0
42: 2021-10-10 0 0 0 0
43: 2021-10-17 0 0 0 0
44: 2021-10-24 0 0 0 0
week Austin_architect Austin_doctor Austin_teacher Denver_architect
Denver_doctor Denver_teacher Phoenix_architect Phoenix_doctor
1: 1 1 0 0
2: 0 1 0 0
3: 0 1 0 1
4: 0 1 0 1
5: 0 1 0 1
6: 0 1 0 1
7: 0 1 0 1
8: 0 1 0 1
9: 0 1 0 1
10: 0 1 0 1
11: 0 1 0 1
12: 0 1 0 1
13: 0 1 0 1
14: 0 1 0 1
15: 0 1 0 1
16: 0 1 0 1
17: 0 1 0 1
18: 0 1 0 1
19: 0 1 0 1
20: 0 1 0 1
21: 0 1 0 1
22: 0 1 0 1
23: 0 1 0 1
24: 0 1 0 1
25: 0 1 0 1
26: 0 1 0 1
27: 0 1 0 1
28: 0 1 0 1
29: 0 1 0 1
30: 0 1 0 1
31: 0 1 0 1
32: 0 1 0 1
33: 0 1 0 1
34: 0 1 0 1
35: 0 1 0 1
36: 0 1 0 1
37: 0 1 0 1
38: 0 1 0 1
39: 0 1 0 1
40: 0 1 0 1
41: 0 1 0 1
42: 0 1 0 1
43: 0 1 0 1
44: 0 1 0 1
Denver_doctor Denver_teacher Phoenix_architect Phoenix_doctor
Phoenix_teacher Seattle_architect Seattle_doctor Seattle_teacher
1: 0 0 0 1
2: 0 0 0 1
3: 0 0 0 1
4: 0 0 0 1
5: 0 0 0 1
6: 0 0 0 1
7: 0 0 0 1
8: 0 0 0 1
9: 0 0 0 1
10: 0 0 0 1
11: 0 0 0 1
12: 0 0 0 1
13: 0 0 0 1
14: 0 0 0 1
15: 0 0 0 1
16: 0 0 0 1
17: 0 0 0 1
18: 0 0 0 1
19: 0 0 0 1
20: 0 0 0 1
21: 0 0 0 1
22: 0 0 0 1
23: 0 0 0 1
24: 0 0 0 1
25: 0 0 0 1
26: 0 0 0 1
27: 0 0 0 1
28: 0 0 0 1
29: 0 0 0 1
30: 0 0 0 1
31: 0 0 0 1
32: 0 0 0 1
33: 0 0 0 1
34: 0 0 0 1
35: 0 0 0 1
36: 0 0 0 1
37: 0 0 0 1
38: 0 0 0 1
39: 0 0 0 1
40: 0 0 0 1
41: 0 0 0 1
42: 0 0 0 1
43: 0 0 0 1
44: 0 0 0 1
Phoenix_teacher Seattle_architect Seattle_doctor Seattle_teacher