在 tsibble 中指定间隔、开始和结束
Specify interval, start and end in tsibble
我想生成一个完整的面板(每月)时间序列
我试过 tsibble
这对大数据很有效,但对于有大量缺失数据的小集合,它似乎选择了非常宽的间隔。
此外,为了便于比较许多不同的集合,我想指定一个开始和结束日期。
library(dplyr)
data <- structure(list(
month = structure(c(18078, 18201), class = "Date"),
account = c("3125", "3100"),
sum = c(-21.0084, -2000)),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -2L))
data %>%
mutate(month = tsibble::yearmonth(month)) %>%
tsibble::as_tsibble(key = account, index = month) %>%
tsibble::fill_gaps(sum = 0, .full = T)
这里我有一个最小的例子,结果是
# A tibble: 4 x 3
month account sum
<mth> <chr> <dbl>
1 2019 Jul 3100 0
2 2019 Nov 3100 -2000
3 2019 Jul 3125 -21.0
4 2019 Nov 3125 0
但应该从 5 月到 12 月开始,每个组(帐户)每个缺失的月份为 0。
library(dplyr, warn.conflicts = FALSE)
library(tsibble, warn.conflicts = FALSE)
data <- structure(list(
month = structure(c(18078, 18201), class = "Date"),
account = c("3125", "3100"),
sum = c(-21.0084, -2000)),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -2L))
data %>%
mutate(month = yearmonth(month)) %>%
as_tsibble(key = account, index = month) %>%
full_join(
tibble(
month = seq(as.Date("2019-05-01"), as.Date("2019-12-01"), by = "1 month")
)
) %>%
fill_gaps(sum = 0, .full = TRUE) %>%
filter(account != is.na(account)) %>%
print(n = 20)
#> Joining, by = "month"
#> # A tsibble: 16 x 3 [1M]
#> # Key: account [2]
#> month account sum
#> <mth> <chr> <dbl>
#> 1 2019 May 3100 0
#> 2 2019 Jun 3100 0
#> 3 2019 Jul 3100 0
#> 4 2019 Aug 3100 0
#> 5 2019 Sep 3100 0
#> 6 2019 Oct 3100 0
#> 7 2019 Nov 3100 -2000
#> 8 2019 Dec 3100 0
#> 9 2019 May 3125 0
#> 10 2019 Jun 3125 0
#> 11 2019 Jul 3125 -21.0
#> 12 2019 Aug 3125 0
#> 13 2019 Sep 3125 0
#> 14 2019 Oct 3125 0
#> 15 2019 Nov 3125 0
#> 16 2019 Dec 3125 0
由 reprex package (v0.3.0)
于 2020-01-15 创建
我想生成一个完整的面板(每月)时间序列
我试过 tsibble
这对大数据很有效,但对于有大量缺失数据的小集合,它似乎选择了非常宽的间隔。
此外,为了便于比较许多不同的集合,我想指定一个开始和结束日期。
library(dplyr)
data <- structure(list(
month = structure(c(18078, 18201), class = "Date"),
account = c("3125", "3100"),
sum = c(-21.0084, -2000)),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -2L))
data %>%
mutate(month = tsibble::yearmonth(month)) %>%
tsibble::as_tsibble(key = account, index = month) %>%
tsibble::fill_gaps(sum = 0, .full = T)
这里我有一个最小的例子,结果是
# A tibble: 4 x 3
month account sum
<mth> <chr> <dbl>
1 2019 Jul 3100 0
2 2019 Nov 3100 -2000
3 2019 Jul 3125 -21.0
4 2019 Nov 3125 0
但应该从 5 月到 12 月开始,每个组(帐户)每个缺失的月份为 0。
library(dplyr, warn.conflicts = FALSE)
library(tsibble, warn.conflicts = FALSE)
data <- structure(list(
month = structure(c(18078, 18201), class = "Date"),
account = c("3125", "3100"),
sum = c(-21.0084, -2000)),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -2L))
data %>%
mutate(month = yearmonth(month)) %>%
as_tsibble(key = account, index = month) %>%
full_join(
tibble(
month = seq(as.Date("2019-05-01"), as.Date("2019-12-01"), by = "1 month")
)
) %>%
fill_gaps(sum = 0, .full = TRUE) %>%
filter(account != is.na(account)) %>%
print(n = 20)
#> Joining, by = "month"
#> # A tsibble: 16 x 3 [1M]
#> # Key: account [2]
#> month account sum
#> <mth> <chr> <dbl>
#> 1 2019 May 3100 0
#> 2 2019 Jun 3100 0
#> 3 2019 Jul 3100 0
#> 4 2019 Aug 3100 0
#> 5 2019 Sep 3100 0
#> 6 2019 Oct 3100 0
#> 7 2019 Nov 3100 -2000
#> 8 2019 Dec 3100 0
#> 9 2019 May 3125 0
#> 10 2019 Jun 3125 0
#> 11 2019 Jul 3125 -21.0
#> 12 2019 Aug 3125 0
#> 13 2019 Sep 3125 0
#> 14 2019 Oct 3125 0
#> 15 2019 Nov 3125 0
#> 16 2019 Dec 3125 0
由 reprex package (v0.3.0)
于 2020-01-15 创建