用重叠日期子集 10 天间隔

Subsetting 10 day intervals with a overlapping date

我有一个年度数据集,我想将其分成 10 day 个间隔。例如,我想子集 2010-12-262011-01-04 使用这些日期的 xy 值创建一个家庭范围,然后获取下一个 9 天加上子集数据之间的重叠日期,在这种情况下,它将是 2011-01-04 (2011-01-04 to 2011-01-13)。有什么好的方法吗?

#Example dataset
library(lubridate)
date <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = date,
                 x = runif(752, min = 60000, max = 80000),
                 y = runif(752, min = 800000, max = 900000))

> df
          date        x        y
1   2010-12-26 73649.16 894525.6
2   2010-12-27 69005.21 898233.7
3   2010-12-28 64982.90 873692.6
4   2010-12-29 64592.93 841055.2
5   2010-12-30 60475.99 854524.3
6   2010-12-31 79206.43 879468.2
7   2011-01-01 76692.40 830569.6
8   2011-01-02 70378.51 834338.2
9   2011-01-03 74977.73 820568.0
10  2011-01-04 63023.47 899482.3
11  2011-01-05 77046.80 886369.0
12  2011-01-06 68751.91 841074.7
13  2011-01-07 65471.34 888525.3
14  2011-01-08 61138.68 855039.5
15  2011-01-09 65660.66 880227.2
16  2011-01-10 75526.36 838478.6
17  2011-01-11 64485.74 808947.7
18  2011-01-12 61405.69 887784.1
19  2011-01-13 70561.86 847634.7
20  2011-01-14 69234.98 840012.1
21  2011-01-15 75539.43 817132.5
22  2011-01-16 74227.28 839230.4
23  2011-01-17 74548.59 855006.3
24  2011-01-18 72020.71 815036.7
25  2011-01-19 70814.50 883029.6
26  2011-01-20 76924.65 817289.5
27  2011-01-21 60556.21 807427.2

感谢您的宝贵时间。

这个呢?

res <- lapply(
  seq(0, nrow(df), by = 10),
  function(k) df[max(k, 1):min(k + 10, nrow(df)), ]
)

这给出了

> head(res)
[[1]]
         date        x        y
1  2010-12-26 63748.27 856758.7
2  2010-12-27 73774.90 860222.6
3  2010-12-28 68893.24 804194.7
4  2010-12-29 79791.86 810624.5
5  2010-12-30 60073.50 809016.0
6  2010-12-31 74020.15 883304.9
7  2011-01-01 67144.95 889235.3
8  2011-01-02 67205.20 810514.2
9  2011-01-03 68518.68 882730.7
10 2011-01-04 70442.87 892934.1

[[2]]
         date        x        y
10 2011-01-04 70442.87 892934.1
11 2011-01-05 65466.26 855725.2
12 2011-01-06 70034.79 879770.8
13 2011-01-07 60195.42 888653.4
14 2011-01-08 65208.12 883176.8
15 2011-01-09 63040.52 821902.3
16 2011-01-10 62302.66 815025.1
17 2011-01-11 77662.53 829474.5
18 2011-01-12 64802.65 809961.7
19 2011-01-13 71812.61 810755.1
20 2011-01-14 63086.30 820029.9

[[3]]
         date        x        y
20 2011-01-14 63086.30 820029.9
21 2011-01-15 75548.71 806966.7
22 2011-01-16 68572.89 847679.0
23 2011-01-17 71408.65 889490.2
24 2011-01-18 73507.84 815559.7
25 2011-01-19 76854.50 899108.6
26 2011-01-20 79138.08 858537.1
27 2011-01-21 73960.14 898957.3
28 2011-01-22 75048.41 864425.6
29 2011-01-23 61059.20 857558.3
30 2011-01-24 67455.03 853017.1

[[4]]
         date        x        y
30 2011-01-24 67455.03 853017.1
31 2011-01-25 72727.70 891708.8
32 2011-01-26 73230.11 836404.6
33 2011-01-27 67719.05 815528.3
34 2011-01-28 65139.66 826289.8
35 2011-01-29 65145.94 818736.4
36 2011-01-30 74206.03 839014.2
37 2011-01-31 77259.35 855653.0
38 2011-02-01 77809.65 836912.6
39 2011-02-02 62744.02 831549.0
40 2011-02-03 79594.93 873313.6

[[5]]
         date        x        y
40 2011-02-03 79594.93 873313.6
41 2011-02-04 78942.86 825001.1
42 2011-02-05 61346.88 871578.5
43 2011-02-06 68526.18 863300.7
44 2011-02-07 76920.15 844180.0
45 2011-02-08 73023.08 823092.4
46 2011-02-09 64287.09 804682.7
47 2011-02-10 71377.16 829219.8
48 2011-02-11 68930.80 814626.6
49 2011-02-12 70780.95 831549.8
50 2011-02-13 73740.99 895868.0

[[6]]
         date        x        y
50 2011-02-13 73740.99 895868.0
51 2011-02-14 79846.05 844586.6
52 2011-02-15 66559.60 835943.0
53 2011-02-16 68522.99 837633.2
54 2011-02-17 65898.75 891364.4
55 2011-02-18 73809.44 842797.9
56 2011-02-19 73336.53 821166.5
57 2011-02-20 72780.91 883200.6
58 2011-02-21 73240.81 864142.2
59 2011-02-22 78855.11 868599.6
60 2011-02-23 69236.04 845566.6

使用 dplyr 包的替代解决方案适用于需要 n 个日期而不是 10 组的情况。在您的示例中,我们假设每个日期一行。

library(lubridate)
dt <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = dt,
  x = runif(752, min = 60000, max = 80000),
  y = runif(752, min = 800000, max = 900000))

library(dplyr)
n <- 10
df |>
  arrange(date) |>
  mutate(id = 0:(nrow(df) - 1),
    group = id %/% n + 1) |>
  group_by(group) |>
  group_split() |>
  head(n=2)
#> [[1]]
#> # A tibble: 10 x 5
#>    date            x       y    id group
#>    <date>      <dbl>   <dbl> <int> <dbl>
#>  1 2010-12-26 70488. 884674.     0     1
#>  2 2010-12-27 74133. 888636.     1     1
#>  3 2010-12-28 66635. 838681.     2     1
#>  4 2010-12-29 67931. 808998.     3     1
#>  5 2010-12-30 68032. 868329.     4     1
#>  6 2010-12-31 76891. 826684.     5     1
#>  7 2011-01-01 70793. 890401.     6     1
#>  8 2011-01-02 60427. 846447.     7     1
#>  9 2011-01-03 69902. 886152.     8     1
#> 10 2011-01-04 64253. 859245.     9     1
#> 
#> [[2]]
#> # A tibble: 10 x 5
#>    date            x       y    id group
#>    <date>      <dbl>   <dbl> <int> <dbl>
#>  1 2011-01-05 74260. 844636.    10     2
#>  2 2011-01-06 75631. 807722.    11     2
#>  3 2011-01-07 74443. 840540.    12     2
#>  4 2011-01-08 78903. 811777.    13     2
#>  5 2011-01-09 78531. 894333.    14     2
#>  6 2011-01-10 79310. 812625.    15     2
#>  7 2011-01-11 71701. 801691.    16     2
#>  8 2011-01-12 63254. 854752.    17     2
#>  9 2011-01-13 72813. 837910.    18     2
#> 10 2011-01-14 62718. 877568.    19     2

reprex package (v2.0.0)

于 2021-07-05 创建