用重叠日期子集 10 天间隔
Subsetting 10 day intervals with a overlapping date
我有一个年度数据集,我想将其分成 10 day
个间隔。例如,我想子集 2010-12-26
到 2011-01-04
使用这些日期的 x
和 y
值创建一个家庭范围,然后获取下一个 9
天加上子集数据之间的重叠日期,在这种情况下,它将是 2011-01-04
(2011-01-04 to 2011-01-13)
。有什么好的方法吗?
#Example dataset
library(lubridate)
date <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = date,
x = runif(752, min = 60000, max = 80000),
y = runif(752, min = 800000, max = 900000))
> df
date x y
1 2010-12-26 73649.16 894525.6
2 2010-12-27 69005.21 898233.7
3 2010-12-28 64982.90 873692.6
4 2010-12-29 64592.93 841055.2
5 2010-12-30 60475.99 854524.3
6 2010-12-31 79206.43 879468.2
7 2011-01-01 76692.40 830569.6
8 2011-01-02 70378.51 834338.2
9 2011-01-03 74977.73 820568.0
10 2011-01-04 63023.47 899482.3
11 2011-01-05 77046.80 886369.0
12 2011-01-06 68751.91 841074.7
13 2011-01-07 65471.34 888525.3
14 2011-01-08 61138.68 855039.5
15 2011-01-09 65660.66 880227.2
16 2011-01-10 75526.36 838478.6
17 2011-01-11 64485.74 808947.7
18 2011-01-12 61405.69 887784.1
19 2011-01-13 70561.86 847634.7
20 2011-01-14 69234.98 840012.1
21 2011-01-15 75539.43 817132.5
22 2011-01-16 74227.28 839230.4
23 2011-01-17 74548.59 855006.3
24 2011-01-18 72020.71 815036.7
25 2011-01-19 70814.50 883029.6
26 2011-01-20 76924.65 817289.5
27 2011-01-21 60556.21 807427.2
感谢您的宝贵时间。
这个呢?
res <- lapply(
seq(0, nrow(df), by = 10),
function(k) df[max(k, 1):min(k + 10, nrow(df)), ]
)
这给出了
> head(res)
[[1]]
date x y
1 2010-12-26 63748.27 856758.7
2 2010-12-27 73774.90 860222.6
3 2010-12-28 68893.24 804194.7
4 2010-12-29 79791.86 810624.5
5 2010-12-30 60073.50 809016.0
6 2010-12-31 74020.15 883304.9
7 2011-01-01 67144.95 889235.3
8 2011-01-02 67205.20 810514.2
9 2011-01-03 68518.68 882730.7
10 2011-01-04 70442.87 892934.1
[[2]]
date x y
10 2011-01-04 70442.87 892934.1
11 2011-01-05 65466.26 855725.2
12 2011-01-06 70034.79 879770.8
13 2011-01-07 60195.42 888653.4
14 2011-01-08 65208.12 883176.8
15 2011-01-09 63040.52 821902.3
16 2011-01-10 62302.66 815025.1
17 2011-01-11 77662.53 829474.5
18 2011-01-12 64802.65 809961.7
19 2011-01-13 71812.61 810755.1
20 2011-01-14 63086.30 820029.9
[[3]]
date x y
20 2011-01-14 63086.30 820029.9
21 2011-01-15 75548.71 806966.7
22 2011-01-16 68572.89 847679.0
23 2011-01-17 71408.65 889490.2
24 2011-01-18 73507.84 815559.7
25 2011-01-19 76854.50 899108.6
26 2011-01-20 79138.08 858537.1
27 2011-01-21 73960.14 898957.3
28 2011-01-22 75048.41 864425.6
29 2011-01-23 61059.20 857558.3
30 2011-01-24 67455.03 853017.1
[[4]]
date x y
30 2011-01-24 67455.03 853017.1
31 2011-01-25 72727.70 891708.8
32 2011-01-26 73230.11 836404.6
33 2011-01-27 67719.05 815528.3
34 2011-01-28 65139.66 826289.8
35 2011-01-29 65145.94 818736.4
36 2011-01-30 74206.03 839014.2
37 2011-01-31 77259.35 855653.0
38 2011-02-01 77809.65 836912.6
39 2011-02-02 62744.02 831549.0
40 2011-02-03 79594.93 873313.6
[[5]]
date x y
40 2011-02-03 79594.93 873313.6
41 2011-02-04 78942.86 825001.1
42 2011-02-05 61346.88 871578.5
43 2011-02-06 68526.18 863300.7
44 2011-02-07 76920.15 844180.0
45 2011-02-08 73023.08 823092.4
46 2011-02-09 64287.09 804682.7
47 2011-02-10 71377.16 829219.8
48 2011-02-11 68930.80 814626.6
49 2011-02-12 70780.95 831549.8
50 2011-02-13 73740.99 895868.0
[[6]]
date x y
50 2011-02-13 73740.99 895868.0
51 2011-02-14 79846.05 844586.6
52 2011-02-15 66559.60 835943.0
53 2011-02-16 68522.99 837633.2
54 2011-02-17 65898.75 891364.4
55 2011-02-18 73809.44 842797.9
56 2011-02-19 73336.53 821166.5
57 2011-02-20 72780.91 883200.6
58 2011-02-21 73240.81 864142.2
59 2011-02-22 78855.11 868599.6
60 2011-02-23 69236.04 845566.6
使用 dplyr 包的替代解决方案适用于需要 n 个日期而不是 10 组的情况。在您的示例中,我们假设每个日期一行。
library(lubridate)
dt <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = dt,
x = runif(752, min = 60000, max = 80000),
y = runif(752, min = 800000, max = 900000))
library(dplyr)
n <- 10
df |>
arrange(date) |>
mutate(id = 0:(nrow(df) - 1),
group = id %/% n + 1) |>
group_by(group) |>
group_split() |>
head(n=2)
#> [[1]]
#> # A tibble: 10 x 5
#> date x y id group
#> <date> <dbl> <dbl> <int> <dbl>
#> 1 2010-12-26 70488. 884674. 0 1
#> 2 2010-12-27 74133. 888636. 1 1
#> 3 2010-12-28 66635. 838681. 2 1
#> 4 2010-12-29 67931. 808998. 3 1
#> 5 2010-12-30 68032. 868329. 4 1
#> 6 2010-12-31 76891. 826684. 5 1
#> 7 2011-01-01 70793. 890401. 6 1
#> 8 2011-01-02 60427. 846447. 7 1
#> 9 2011-01-03 69902. 886152. 8 1
#> 10 2011-01-04 64253. 859245. 9 1
#>
#> [[2]]
#> # A tibble: 10 x 5
#> date x y id group
#> <date> <dbl> <dbl> <int> <dbl>
#> 1 2011-01-05 74260. 844636. 10 2
#> 2 2011-01-06 75631. 807722. 11 2
#> 3 2011-01-07 74443. 840540. 12 2
#> 4 2011-01-08 78903. 811777. 13 2
#> 5 2011-01-09 78531. 894333. 14 2
#> 6 2011-01-10 79310. 812625. 15 2
#> 7 2011-01-11 71701. 801691. 16 2
#> 8 2011-01-12 63254. 854752. 17 2
#> 9 2011-01-13 72813. 837910. 18 2
#> 10 2011-01-14 62718. 877568. 19 2
由 reprex package (v2.0.0)
于 2021-07-05 创建
我有一个年度数据集,我想将其分成 10 day
个间隔。例如,我想子集 2010-12-26
到 2011-01-04
使用这些日期的 x
和 y
值创建一个家庭范围,然后获取下一个 9
天加上子集数据之间的重叠日期,在这种情况下,它将是 2011-01-04
(2011-01-04 to 2011-01-13)
。有什么好的方法吗?
#Example dataset
library(lubridate)
date <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = date,
x = runif(752, min = 60000, max = 80000),
y = runif(752, min = 800000, max = 900000))
> df
date x y
1 2010-12-26 73649.16 894525.6
2 2010-12-27 69005.21 898233.7
3 2010-12-28 64982.90 873692.6
4 2010-12-29 64592.93 841055.2
5 2010-12-30 60475.99 854524.3
6 2010-12-31 79206.43 879468.2
7 2011-01-01 76692.40 830569.6
8 2011-01-02 70378.51 834338.2
9 2011-01-03 74977.73 820568.0
10 2011-01-04 63023.47 899482.3
11 2011-01-05 77046.80 886369.0
12 2011-01-06 68751.91 841074.7
13 2011-01-07 65471.34 888525.3
14 2011-01-08 61138.68 855039.5
15 2011-01-09 65660.66 880227.2
16 2011-01-10 75526.36 838478.6
17 2011-01-11 64485.74 808947.7
18 2011-01-12 61405.69 887784.1
19 2011-01-13 70561.86 847634.7
20 2011-01-14 69234.98 840012.1
21 2011-01-15 75539.43 817132.5
22 2011-01-16 74227.28 839230.4
23 2011-01-17 74548.59 855006.3
24 2011-01-18 72020.71 815036.7
25 2011-01-19 70814.50 883029.6
26 2011-01-20 76924.65 817289.5
27 2011-01-21 60556.21 807427.2
感谢您的宝贵时间。
这个呢?
res <- lapply(
seq(0, nrow(df), by = 10),
function(k) df[max(k, 1):min(k + 10, nrow(df)), ]
)
这给出了
> head(res)
[[1]]
date x y
1 2010-12-26 63748.27 856758.7
2 2010-12-27 73774.90 860222.6
3 2010-12-28 68893.24 804194.7
4 2010-12-29 79791.86 810624.5
5 2010-12-30 60073.50 809016.0
6 2010-12-31 74020.15 883304.9
7 2011-01-01 67144.95 889235.3
8 2011-01-02 67205.20 810514.2
9 2011-01-03 68518.68 882730.7
10 2011-01-04 70442.87 892934.1
[[2]]
date x y
10 2011-01-04 70442.87 892934.1
11 2011-01-05 65466.26 855725.2
12 2011-01-06 70034.79 879770.8
13 2011-01-07 60195.42 888653.4
14 2011-01-08 65208.12 883176.8
15 2011-01-09 63040.52 821902.3
16 2011-01-10 62302.66 815025.1
17 2011-01-11 77662.53 829474.5
18 2011-01-12 64802.65 809961.7
19 2011-01-13 71812.61 810755.1
20 2011-01-14 63086.30 820029.9
[[3]]
date x y
20 2011-01-14 63086.30 820029.9
21 2011-01-15 75548.71 806966.7
22 2011-01-16 68572.89 847679.0
23 2011-01-17 71408.65 889490.2
24 2011-01-18 73507.84 815559.7
25 2011-01-19 76854.50 899108.6
26 2011-01-20 79138.08 858537.1
27 2011-01-21 73960.14 898957.3
28 2011-01-22 75048.41 864425.6
29 2011-01-23 61059.20 857558.3
30 2011-01-24 67455.03 853017.1
[[4]]
date x y
30 2011-01-24 67455.03 853017.1
31 2011-01-25 72727.70 891708.8
32 2011-01-26 73230.11 836404.6
33 2011-01-27 67719.05 815528.3
34 2011-01-28 65139.66 826289.8
35 2011-01-29 65145.94 818736.4
36 2011-01-30 74206.03 839014.2
37 2011-01-31 77259.35 855653.0
38 2011-02-01 77809.65 836912.6
39 2011-02-02 62744.02 831549.0
40 2011-02-03 79594.93 873313.6
[[5]]
date x y
40 2011-02-03 79594.93 873313.6
41 2011-02-04 78942.86 825001.1
42 2011-02-05 61346.88 871578.5
43 2011-02-06 68526.18 863300.7
44 2011-02-07 76920.15 844180.0
45 2011-02-08 73023.08 823092.4
46 2011-02-09 64287.09 804682.7
47 2011-02-10 71377.16 829219.8
48 2011-02-11 68930.80 814626.6
49 2011-02-12 70780.95 831549.8
50 2011-02-13 73740.99 895868.0
[[6]]
date x y
50 2011-02-13 73740.99 895868.0
51 2011-02-14 79846.05 844586.6
52 2011-02-15 66559.60 835943.0
53 2011-02-16 68522.99 837633.2
54 2011-02-17 65898.75 891364.4
55 2011-02-18 73809.44 842797.9
56 2011-02-19 73336.53 821166.5
57 2011-02-20 72780.91 883200.6
58 2011-02-21 73240.81 864142.2
59 2011-02-22 78855.11 868599.6
60 2011-02-23 69236.04 845566.6
使用 dplyr 包的替代解决方案适用于需要 n 个日期而不是 10 组的情况。在您的示例中,我们假设每个日期一行。
library(lubridate)
dt <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = dt,
x = runif(752, min = 60000, max = 80000),
y = runif(752, min = 800000, max = 900000))
library(dplyr)
n <- 10
df |>
arrange(date) |>
mutate(id = 0:(nrow(df) - 1),
group = id %/% n + 1) |>
group_by(group) |>
group_split() |>
head(n=2)
#> [[1]]
#> # A tibble: 10 x 5
#> date x y id group
#> <date> <dbl> <dbl> <int> <dbl>
#> 1 2010-12-26 70488. 884674. 0 1
#> 2 2010-12-27 74133. 888636. 1 1
#> 3 2010-12-28 66635. 838681. 2 1
#> 4 2010-12-29 67931. 808998. 3 1
#> 5 2010-12-30 68032. 868329. 4 1
#> 6 2010-12-31 76891. 826684. 5 1
#> 7 2011-01-01 70793. 890401. 6 1
#> 8 2011-01-02 60427. 846447. 7 1
#> 9 2011-01-03 69902. 886152. 8 1
#> 10 2011-01-04 64253. 859245. 9 1
#>
#> [[2]]
#> # A tibble: 10 x 5
#> date x y id group
#> <date> <dbl> <dbl> <int> <dbl>
#> 1 2011-01-05 74260. 844636. 10 2
#> 2 2011-01-06 75631. 807722. 11 2
#> 3 2011-01-07 74443. 840540. 12 2
#> 4 2011-01-08 78903. 811777. 13 2
#> 5 2011-01-09 78531. 894333. 14 2
#> 6 2011-01-10 79310. 812625. 15 2
#> 7 2011-01-11 71701. 801691. 16 2
#> 8 2011-01-12 63254. 854752. 17 2
#> 9 2011-01-13 72813. 837910. 18 2
#> 10 2011-01-14 62718. 877568. 19 2
由 reprex package (v2.0.0)
于 2021-07-05 创建