我需要通过在 R 中的两行之间用另一列中的值填充日期来延长数据集
I need to elongate a dataset by filling in dates between two rows in R with a value in another column
我不确定是否有人问过这个问题,但我真的找不到。我有以下类型的数据集:
set.seed(1)
d1 <- data.frame(open = rnorm(5), Y = as.Date(c("2020-05-01", "2020-05-08", "2020-05-15", "2020-05-22", "2020-05-29")), region = c("a", "a", "a", "a", "a"))
+------------+------------+--------+--+--+
| open | Y | region | | |
+------------+------------+--------+--+--+
| -0.6264538 | 2020-05-01 | a | | |
| 0.1836433 | 2020-05-08 | a | | |
| -0.8356286 | 2020-05-15 | a | | |
| 1.5952808 | 2020-05-22 | a | | |
| 0.3295078 | 2020-05-29 | a | | |
+------------+------------+--------+--+--+
现在我想把它变成
open<-c(rep(d1[1,1],times=7),rep(d1[2,1],times=7),rep(d1[3,1],times=7),rep(d1[4,1],times=7),rep(d1[5,1],times=7))
Y<-seq(from = as.Date("2020-05-01"),to = as.Date("2020-06-04"),by="days")
或
+------------+------------+--+--+--+
| open | date | | | |
+------------+------------+--+--+--+
| -0.6264538 | 2020-05-01 | | | |
| -0.6264538 | 2020-05-02 | | | |
| -0.6264538 | 2020-05-03 | | | |
| -0.6264538 | 2020-05-04 | | | |
| -0.6264538 | 2020-05-05 | | | |
| -0.6264538 | 2020-05-06 | | | |
| -0.6264538 | 2020-05-07 | | | |
| 0.1836433 | 2020-05-08 | | | |
| 0.1836433 | 2020-05-09 | | | |
| 0.1836433 | 2020-05-10 | | | |
+------------+------------+--+--+--+
基本上,我有几周开始时的数据。 'open' 变量也适用于周之间的每个日期,所以我想填写它和 'elongate' 某种意义上的数据。
此外,我也需要按组(如在区域中)进行此操作
尝试使用这种方法创建一个包含日期序列的数据框,然后使用 tidyverse
函数合并并完成 NA
值。这是接近您想要的解决方案的代码:
library(tidyverse)
#Code for dates
dfdates <- data.frame(Y=seq(min(d1$Y),max(d1$Y),by=1))
#Join and fill
newd1 <- dfdates %>%
left_join(d1) %>%
fill(open) %>% select(-region)
输出:
Y open
1 2020-05-01 -0.6264538
2 2020-05-02 -0.6264538
3 2020-05-03 -0.6264538
4 2020-05-04 -0.6264538
5 2020-05-05 -0.6264538
6 2020-05-06 -0.6264538
7 2020-05-07 -0.6264538
8 2020-05-08 0.1836433
9 2020-05-09 0.1836433
10 2020-05-10 0.1836433
11 2020-05-11 0.1836433
12 2020-05-12 0.1836433
13 2020-05-13 0.1836433
14 2020-05-14 0.1836433
15 2020-05-15 -0.8356286
16 2020-05-16 -0.8356286
17 2020-05-17 -0.8356286
18 2020-05-18 -0.8356286
19 2020-05-19 -0.8356286
20 2020-05-20 -0.8356286
21 2020-05-21 -0.8356286
22 2020-05-22 1.5952808
23 2020-05-23 1.5952808
24 2020-05-24 1.5952808
25 2020-05-25 1.5952808
26 2020-05-26 1.5952808
27 2020-05-27 1.5952808
28 2020-05-28 1.5952808
29 2020-05-29 0.3295078
如果您需要按 region
分组,您可以先填充变量,然后使用 group_by()
并完成 open
变量:
#Join and fill 2
newd1 <- dfdates %>%
left_join(d1) %>%
fill(region) %>%
group_by(region) %>%
fill(open)
输出:
# A tibble: 29 x 3
# Groups: region [1]
Y open region
<date> <dbl> <fct>
1 2020-05-01 -0.626 a
2 2020-05-02 -0.626 a
3 2020-05-03 -0.626 a
4 2020-05-04 -0.626 a
5 2020-05-05 -0.626 a
6 2020-05-06 -0.626 a
7 2020-05-07 -0.626 a
8 2020-05-08 0.184 a
9 2020-05-09 0.184 a
10 2020-05-10 0.184 a
# ... with 19 more rows
使用 tidyr
和 dplyr
您可以完成日期然后填写。
library(tidyr)
library(dplyr)
d1 %>%
group_by(region) %>%
complete(Y = seq.Date(min(Y), max(Y), by = "day")) %>%
fill(open, .direction = "down")
我不确定是否有人问过这个问题,但我真的找不到。我有以下类型的数据集:
set.seed(1)
d1 <- data.frame(open = rnorm(5), Y = as.Date(c("2020-05-01", "2020-05-08", "2020-05-15", "2020-05-22", "2020-05-29")), region = c("a", "a", "a", "a", "a"))
+------------+------------+--------+--+--+
| open | Y | region | | |
+------------+------------+--------+--+--+
| -0.6264538 | 2020-05-01 | a | | |
| 0.1836433 | 2020-05-08 | a | | |
| -0.8356286 | 2020-05-15 | a | | |
| 1.5952808 | 2020-05-22 | a | | |
| 0.3295078 | 2020-05-29 | a | | |
+------------+------------+--------+--+--+
现在我想把它变成
open<-c(rep(d1[1,1],times=7),rep(d1[2,1],times=7),rep(d1[3,1],times=7),rep(d1[4,1],times=7),rep(d1[5,1],times=7))
Y<-seq(from = as.Date("2020-05-01"),to = as.Date("2020-06-04"),by="days")
或
+------------+------------+--+--+--+
| open | date | | | |
+------------+------------+--+--+--+
| -0.6264538 | 2020-05-01 | | | |
| -0.6264538 | 2020-05-02 | | | |
| -0.6264538 | 2020-05-03 | | | |
| -0.6264538 | 2020-05-04 | | | |
| -0.6264538 | 2020-05-05 | | | |
| -0.6264538 | 2020-05-06 | | | |
| -0.6264538 | 2020-05-07 | | | |
| 0.1836433 | 2020-05-08 | | | |
| 0.1836433 | 2020-05-09 | | | |
| 0.1836433 | 2020-05-10 | | | |
+------------+------------+--+--+--+
基本上,我有几周开始时的数据。 'open' 变量也适用于周之间的每个日期,所以我想填写它和 'elongate' 某种意义上的数据。
此外,我也需要按组(如在区域中)进行此操作
尝试使用这种方法创建一个包含日期序列的数据框,然后使用 tidyverse
函数合并并完成 NA
值。这是接近您想要的解决方案的代码:
library(tidyverse)
#Code for dates
dfdates <- data.frame(Y=seq(min(d1$Y),max(d1$Y),by=1))
#Join and fill
newd1 <- dfdates %>%
left_join(d1) %>%
fill(open) %>% select(-region)
输出:
Y open
1 2020-05-01 -0.6264538
2 2020-05-02 -0.6264538
3 2020-05-03 -0.6264538
4 2020-05-04 -0.6264538
5 2020-05-05 -0.6264538
6 2020-05-06 -0.6264538
7 2020-05-07 -0.6264538
8 2020-05-08 0.1836433
9 2020-05-09 0.1836433
10 2020-05-10 0.1836433
11 2020-05-11 0.1836433
12 2020-05-12 0.1836433
13 2020-05-13 0.1836433
14 2020-05-14 0.1836433
15 2020-05-15 -0.8356286
16 2020-05-16 -0.8356286
17 2020-05-17 -0.8356286
18 2020-05-18 -0.8356286
19 2020-05-19 -0.8356286
20 2020-05-20 -0.8356286
21 2020-05-21 -0.8356286
22 2020-05-22 1.5952808
23 2020-05-23 1.5952808
24 2020-05-24 1.5952808
25 2020-05-25 1.5952808
26 2020-05-26 1.5952808
27 2020-05-27 1.5952808
28 2020-05-28 1.5952808
29 2020-05-29 0.3295078
如果您需要按 region
分组,您可以先填充变量,然后使用 group_by()
并完成 open
变量:
#Join and fill 2
newd1 <- dfdates %>%
left_join(d1) %>%
fill(region) %>%
group_by(region) %>%
fill(open)
输出:
# A tibble: 29 x 3
# Groups: region [1]
Y open region
<date> <dbl> <fct>
1 2020-05-01 -0.626 a
2 2020-05-02 -0.626 a
3 2020-05-03 -0.626 a
4 2020-05-04 -0.626 a
5 2020-05-05 -0.626 a
6 2020-05-06 -0.626 a
7 2020-05-07 -0.626 a
8 2020-05-08 0.184 a
9 2020-05-09 0.184 a
10 2020-05-10 0.184 a
# ... with 19 more rows
使用 tidyr
和 dplyr
您可以完成日期然后填写。
library(tidyr)
library(dplyr)
d1 %>%
group_by(region) %>%
complete(Y = seq.Date(min(Y), max(Y), by = "day")) %>%
fill(open, .direction = "down")