用指数估计填空
fill in blanks with exponential estimates
我正在尝试用显示指数增长的数字填充 NA 值。下面是我正在尝试做的数据示例。
library(tidyverse)
expand.grid(X2009H1N1 = "0-17 years",
type = "Cases",
month = seq(as.Date("2009-04-12") , to = as.Date("2010-03-12"), by = "month")) %>%
bind_cols( data.frame(
MidLevelRange = c(0,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)
))
我已经使用了 %>% arrange(month, X2009H1N1) %>%
group_by(X2009H1N1, type ) %>%
mutate(aprox_MidLevelRange = zoo::na.approx(MidLevelRange, na.rm = FALSE))
,但结果对我来说并不是指数级的。谢谢
确定你的结果不是指数的,你正在使用函数 na.approx()
来使用线性插值来估算值。您正在使用的 zoo
包提供使用 na.spline()
函数使用三次样条插值进行插值,但此函数也不会产生指数曲线。
x <- expand.grid(X2009H1N1 = "0-17 years",
type = "Cases",
month = seq(as.Date("2009-04-12"),
to = as.Date("2010-03-12"),
by = "month")) %>%
bind_cols(data.frame(MidLevelRange = c(0,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)))
x %>% arrange(month, X2009H1N1) %>%
group_by(X2009H1N1, type) %>%
mutate(aprox_MidLevelRange = zoo::na.spline(MidLevelRange))
三次样条插值的问题是您的最低值将被插值为负值,这取决于您是否正在寻找这种行为:
# A tibble: 8 x 6
# Groups: X2009H1N1, type [1]
X2009H1N1 type month MidLevelRange lowEst aprox_MidLevelRange
<fct> <fct> <date> <dbl> <dbl> <dbl>
1 0-17 years Cases 2009-04-12 0 0 0
2 0-17 years Cases 2009-05-12 NA NA -18568160.
3 0-17 years Cases 2009-06-12 NA NA -25223342.
4 0-17 years Cases 2009-07-12 NA NA -22929832.
5 0-17 years Cases 2009-08-12 NA NA -14651914.
6 0-17 years Cases 2009-09-12 NA NA -3353875.
7 0-17 years Cases 2009-10-12 8000000 5000000 8000000.
看看 imputeTS 包。
它为时间序列提供了大量的插补函数。查看此 paper 以全面了解所有提供的选项
在您的情况下,使用 Stineman 插值 (imputeTS::na_interpolation(x, option ="stine"
) 可能是一个合适的选择。
此处为您提供的示例:
x <- expand.grid(
X2009H1N1 = "0-17 years",
type = "Cases",
month = seq(as.Date("2009-04-12"),
to = as.Date("2010-03-12"),
by = "month"
)
) %>%
bind_cols(data.frame(
MidLevelRange = c(0, NA, NA, NA, NA, NA, 8000000, 16000000, 18000000, 19000000, 19000000, 19000000),
lowEst = c(0, NA, NA, NA, NA, NA, 5000000, 12000000, 12000000, 13000000, 14000000, 14000000)
))
x %>%
arrange(month, X2009H1N1) %>%
group_by(X2009H1N1, type) %>%
mutate(aprox_MidLevelRange = imputeTS::na_interpolation(MidLevelRange, option = "stine"))
这给你:
# A tibble: 12 x 6
# Groups: X2009H1N1, type [1]
X2009H1N1 type month MidLevelRange lowEst aprox_MidLevelRange
<fct> <fct> <date> <dbl> <dbl> <dbl>
1 0-17 years Cases 2009-04-12 0 0 0
2 0-17 years Cases 2009-05-12 NA NA 593718.
3 0-17 years Cases 2009-06-12 NA NA 1335612.
4 0-17 years Cases 2009-07-12 NA NA 2289061.
5 0-17 years Cases 2009-08-12 NA NA 3559604.
6 0-17 years Cases 2009-09-12 NA NA 5336975.
7 0-17 years Cases 2009-10-12 8000000 5000000 8000000
8 0-17 years Cases 2009-11-12 16000000 12000000 16000000
9 0-17 years Cases 2009-12-12 18000000 12000000 18000000
10 0-17 years Cases 2010-01-12 19000000 13000000 19000000
11 0-17 years Cases 2010-02-12 19000000 14000000 19000000
12 0-17 years Cases 2010-03-12 19000000 14000000 19000000
所以只是比较插值函数,我想这可能是最好的选择。
自己绘制不同的插值选项,看看差异。
一般来说,这是插值选项:
imputeTS::na_interpolation(x, option ="linear")
imputeTS::na_interpolation(x, option ="spline")
imputeTS::na_interpolation(x, option ="stine")
来自 imputeTS 的线性/样条选项与 zoo::approx()/ zoo::spline() 相同。 stine 不存在于动物园中。
我正在尝试用显示指数增长的数字填充 NA 值。下面是我正在尝试做的数据示例。
library(tidyverse)
expand.grid(X2009H1N1 = "0-17 years",
type = "Cases",
month = seq(as.Date("2009-04-12") , to = as.Date("2010-03-12"), by = "month")) %>%
bind_cols( data.frame(
MidLevelRange = c(0,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)
))
我已经使用了 %>% arrange(month, X2009H1N1) %>%
group_by(X2009H1N1, type ) %>%
mutate(aprox_MidLevelRange = zoo::na.approx(MidLevelRange, na.rm = FALSE))
,但结果对我来说并不是指数级的。谢谢
确定你的结果不是指数的,你正在使用函数 na.approx()
来使用线性插值来估算值。您正在使用的 zoo
包提供使用 na.spline()
函数使用三次样条插值进行插值,但此函数也不会产生指数曲线。
x <- expand.grid(X2009H1N1 = "0-17 years",
type = "Cases",
month = seq(as.Date("2009-04-12"),
to = as.Date("2010-03-12"),
by = "month")) %>%
bind_cols(data.frame(MidLevelRange = c(0,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)))
x %>% arrange(month, X2009H1N1) %>%
group_by(X2009H1N1, type) %>%
mutate(aprox_MidLevelRange = zoo::na.spline(MidLevelRange))
三次样条插值的问题是您的最低值将被插值为负值,这取决于您是否正在寻找这种行为:
# A tibble: 8 x 6
# Groups: X2009H1N1, type [1]
X2009H1N1 type month MidLevelRange lowEst aprox_MidLevelRange
<fct> <fct> <date> <dbl> <dbl> <dbl>
1 0-17 years Cases 2009-04-12 0 0 0
2 0-17 years Cases 2009-05-12 NA NA -18568160.
3 0-17 years Cases 2009-06-12 NA NA -25223342.
4 0-17 years Cases 2009-07-12 NA NA -22929832.
5 0-17 years Cases 2009-08-12 NA NA -14651914.
6 0-17 years Cases 2009-09-12 NA NA -3353875.
7 0-17 years Cases 2009-10-12 8000000 5000000 8000000.
看看 imputeTS 包。 它为时间序列提供了大量的插补函数。查看此 paper 以全面了解所有提供的选项
在您的情况下,使用 Stineman 插值 (imputeTS::na_interpolation(x, option ="stine"
) 可能是一个合适的选择。
此处为您提供的示例:
x <- expand.grid(
X2009H1N1 = "0-17 years",
type = "Cases",
month = seq(as.Date("2009-04-12"),
to = as.Date("2010-03-12"),
by = "month"
)
) %>%
bind_cols(data.frame(
MidLevelRange = c(0, NA, NA, NA, NA, NA, 8000000, 16000000, 18000000, 19000000, 19000000, 19000000),
lowEst = c(0, NA, NA, NA, NA, NA, 5000000, 12000000, 12000000, 13000000, 14000000, 14000000)
))
x %>%
arrange(month, X2009H1N1) %>%
group_by(X2009H1N1, type) %>%
mutate(aprox_MidLevelRange = imputeTS::na_interpolation(MidLevelRange, option = "stine"))
这给你:
# A tibble: 12 x 6
# Groups: X2009H1N1, type [1]
X2009H1N1 type month MidLevelRange lowEst aprox_MidLevelRange
<fct> <fct> <date> <dbl> <dbl> <dbl>
1 0-17 years Cases 2009-04-12 0 0 0
2 0-17 years Cases 2009-05-12 NA NA 593718.
3 0-17 years Cases 2009-06-12 NA NA 1335612.
4 0-17 years Cases 2009-07-12 NA NA 2289061.
5 0-17 years Cases 2009-08-12 NA NA 3559604.
6 0-17 years Cases 2009-09-12 NA NA 5336975.
7 0-17 years Cases 2009-10-12 8000000 5000000 8000000
8 0-17 years Cases 2009-11-12 16000000 12000000 16000000
9 0-17 years Cases 2009-12-12 18000000 12000000 18000000
10 0-17 years Cases 2010-01-12 19000000 13000000 19000000
11 0-17 years Cases 2010-02-12 19000000 14000000 19000000
12 0-17 years Cases 2010-03-12 19000000 14000000 19000000
所以只是比较插值函数,我想这可能是最好的选择。
自己绘制不同的插值选项,看看差异。 一般来说,这是插值选项:
imputeTS::na_interpolation(x, option ="linear")
imputeTS::na_interpolation(x, option ="spline")
imputeTS::na_interpolation(x, option ="stine")
来自 imputeTS 的线性/样条选项与 zoo::approx()/ zoo::spline() 相同。 stine 不存在于动物园中。