平滑具有多个 y per x 的时间序列
smoothing a timeseries with multiple y per x
我有很多时间序列,它们都是具有许多分组变量的大型数据框的一部分,我需要对其进行平滑处理。我对 purrr 越来越满意,所以 group_by() %>% nest()
方法似乎是合理的。每个嵌套数据框看起来像这样:
data <- structure(list(time = c(0, 0, 6, 6, 12, 12, 18, 18, 24, 24, 30,
30, 36, 36, 42, 42, 48, 48, 54, 54, 60, 60, 66, 66, 72, 72, 78,
78, 84, 84, 90, 90, 96, 96, 102, 102, 108, 108, 114, 114, 120,
120, 126, 126, 132, 132, 138, 138), confluence = c(14.68764,
19.73559, 2.897458, 3.478664, 3.46789, 4.122939, 4.270285, 4.534702,
4.838222, 5.578382, 5.938678, 6.337464, 7.116287, 7.824044, 8.50258,
10.16758, 11.13803, 13.25756, 18.46681, 11.97336, 24.45211, 14.61754,
30.7178, 19.91414, 37.93423, 26.0687, 45.91022, 33.69255, 57.83714,
42.13477, 69.2417, 54.8134, 79.81015, 68.28696, 89.50358, 78.21476,
95.31271, 87.13279, 97.71458, 94.69752, 98.59245, 97.71144, 98.8707,
98.87447, 98.99731, 99.42957, 99.02805, 99.6716)), row.names = c(NA,
-48L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
ggplot(data = x) +
geom_point(aes(x = time, y = confluence)) +
geom_smooth(aes(x = time, y = confluence))
我想要的平滑函数输出是每个 x(时间点)都有另一列具有平滑值。由于每个 x 有两个 y 值(汇合),因此应该有两个重复且相同的平滑值。
问题是我找不到提供所需输出的平滑函数,因此我可以轻松地通过 mutate 添加一个平滑的列,例如data <- data %>% mutate(smooth_y = FUN(time, confluence))
。我查看了一些平滑函数,例如 loess(data$time ~ data$confluence)
输出一个对象(我猜是一条带有一堆参数的拟合线,我猜)或 supsmu(data$time, data$confluence)
为输出删除重复的 x 值。
是否有可以为所有 x 创建输出的平滑函数?或者有没有一种简单的方法可以将适当的合并合并到不同长度的向量突变中?问题在于不同拆分组中 x/y 对的数量可能不相同(一些缺失值,可能一些重复值),因此它必须是一个健壮的映射(而不是依赖于简单的重复y 值)。
期望的输出:
# head(data)
#
# # A tibble: 6 x 3
# time confluence smooth
# <dbl> <dbl> <dbl>
# 1 0 14.7 14.7
# 2 0 19.7 14.7
# 3 6 2.90 8.72
# 4 6 3.48 8.72
# 5 12 3.47 5.10
# 6 12 4.12 5.10
不确定我是否做对了一切,但据我所知,我建议看一下 broom
包。
使用 loess
进行平滑,您可以使用 broom::augment
轻松地附加具有平滑值的列。但是,我不确定这是否适用于您的所有数据集。
为了让示例更有趣一点,我复制了您的数据集,以向您展示如何将 augment
应用于 purrr
、tidyr
和 dplyr
的一般方法在一堆数据集上:
library(tidyverse)
library(broom)
data_list <- bind_rows(list(data1 = data, data2 = data), .id = "id")
data_sm <- data_list %>%
nest(data = -id) %>%
mutate(mod = purrr::map(data, ~ loess(confluence ~ time, data = .x)),
data = purrr::map(mod, ~ augment(.x))) %>%
unnest(data)
ggplot(data = data_sm) +
geom_point(aes(x = time, y = confluence, color = "raw")) +
geom_smooth(aes(x = time, y = confluence)) +
geom_point(aes(x = time, y = .fitted, color = "smoothed")) +
scale_color_manual(values = c(smoothed = "red", raw = "black")) +
facet_wrap(~id)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
我刚刚意识到我只是太笨了。我认为仅使用平滑公式的输出设置一个额外的列然后到 x 轴值上的 full_join
是非常简单的。
data <- structure(list(time = c(0, 0, 6, 6, 12, 12, 18, 18, 24, 24, 30,
30, 36, 36, 42, 42, 48, 48, 54, 54, 60, 60, 66, 66, 72, 72, 78,
78, 84, 84, 90, 90, 96, 96, 102, 102, 108, 108, 114, 114, 120,
120, 126, 126, 132, 132, 138, 138), confluence = c(14.68764,
19.73559, 2.897458, 3.478664, 3.46789, 4.122939, 4.270285, 4.534702,
4.838222, 5.578382, 5.938678, 6.337464, 7.116287, 7.824044, 8.50258,
10.16758, 11.13803, 13.25756, 18.46681, 11.97336, 24.45211, 14.61754,
30.7178, 19.91414, 37.93423, 26.0687, 45.91022, 33.69255, 57.83714,
42.13477, 69.2417, 54.8134, 79.81015, 68.28696, 89.50358, 78.21476,
95.31271, 87.13279, 97.71458, 94.69752, 98.59245, 97.71144, 98.8707,
98.87447, 98.99731, 99.42957, 99.02805, 99.6716)), row.names = c(NA,
-48L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse )
smooth <- data.frame(supsmu(data$time, data$confluence))
data <- full_join(data, smooth, by= c("time" = "x"))
ggplot(data = data) +
geom_point(aes(x = time, y = confluence)) +
geom_smooth(aes(x = time, y = confluence)) +
geom_point(aes(x = time, y = y), color = "red")
head(data, 10)
# # A tibble: 10 x 3
# time confluence y
# <dbl> <dbl> <dbl>
# 1 0 14.7 14.7
# 2 0 19.7 14.7
# 3 6 2.90 8.72
# 4 6 3.48 8.72
# 5 12 3.47 5.10
# 6 12 4.12 5.10
# 7 18 4.27 4.49
# 8 18 4.53 4.49
# 9 24 4.84 5.30
# 10 24 5.58 5.30
我有很多时间序列,它们都是具有许多分组变量的大型数据框的一部分,我需要对其进行平滑处理。我对 purrr 越来越满意,所以 group_by() %>% nest()
方法似乎是合理的。每个嵌套数据框看起来像这样:
data <- structure(list(time = c(0, 0, 6, 6, 12, 12, 18, 18, 24, 24, 30,
30, 36, 36, 42, 42, 48, 48, 54, 54, 60, 60, 66, 66, 72, 72, 78,
78, 84, 84, 90, 90, 96, 96, 102, 102, 108, 108, 114, 114, 120,
120, 126, 126, 132, 132, 138, 138), confluence = c(14.68764,
19.73559, 2.897458, 3.478664, 3.46789, 4.122939, 4.270285, 4.534702,
4.838222, 5.578382, 5.938678, 6.337464, 7.116287, 7.824044, 8.50258,
10.16758, 11.13803, 13.25756, 18.46681, 11.97336, 24.45211, 14.61754,
30.7178, 19.91414, 37.93423, 26.0687, 45.91022, 33.69255, 57.83714,
42.13477, 69.2417, 54.8134, 79.81015, 68.28696, 89.50358, 78.21476,
95.31271, 87.13279, 97.71458, 94.69752, 98.59245, 97.71144, 98.8707,
98.87447, 98.99731, 99.42957, 99.02805, 99.6716)), row.names = c(NA,
-48L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
ggplot(data = x) +
geom_point(aes(x = time, y = confluence)) +
geom_smooth(aes(x = time, y = confluence))
我想要的平滑函数输出是每个 x(时间点)都有另一列具有平滑值。由于每个 x 有两个 y 值(汇合),因此应该有两个重复且相同的平滑值。
问题是我找不到提供所需输出的平滑函数,因此我可以轻松地通过 mutate 添加一个平滑的列,例如data <- data %>% mutate(smooth_y = FUN(time, confluence))
。我查看了一些平滑函数,例如 loess(data$time ~ data$confluence)
输出一个对象(我猜是一条带有一堆参数的拟合线,我猜)或 supsmu(data$time, data$confluence)
为输出删除重复的 x 值。
是否有可以为所有 x 创建输出的平滑函数?或者有没有一种简单的方法可以将适当的合并合并到不同长度的向量突变中?问题在于不同拆分组中 x/y 对的数量可能不相同(一些缺失值,可能一些重复值),因此它必须是一个健壮的映射(而不是依赖于简单的重复y 值)。
期望的输出:
# head(data)
#
# # A tibble: 6 x 3
# time confluence smooth
# <dbl> <dbl> <dbl>
# 1 0 14.7 14.7
# 2 0 19.7 14.7
# 3 6 2.90 8.72
# 4 6 3.48 8.72
# 5 12 3.47 5.10
# 6 12 4.12 5.10
不确定我是否做对了一切,但据我所知,我建议看一下 broom
包。
使用 loess
进行平滑,您可以使用 broom::augment
轻松地附加具有平滑值的列。但是,我不确定这是否适用于您的所有数据集。
为了让示例更有趣一点,我复制了您的数据集,以向您展示如何将 augment
应用于 purrr
、tidyr
和 dplyr
的一般方法在一堆数据集上:
library(tidyverse)
library(broom)
data_list <- bind_rows(list(data1 = data, data2 = data), .id = "id")
data_sm <- data_list %>%
nest(data = -id) %>%
mutate(mod = purrr::map(data, ~ loess(confluence ~ time, data = .x)),
data = purrr::map(mod, ~ augment(.x))) %>%
unnest(data)
ggplot(data = data_sm) +
geom_point(aes(x = time, y = confluence, color = "raw")) +
geom_smooth(aes(x = time, y = confluence)) +
geom_point(aes(x = time, y = .fitted, color = "smoothed")) +
scale_color_manual(values = c(smoothed = "red", raw = "black")) +
facet_wrap(~id)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
我刚刚意识到我只是太笨了。我认为仅使用平滑公式的输出设置一个额外的列然后到 x 轴值上的 full_join
是非常简单的。
data <- structure(list(time = c(0, 0, 6, 6, 12, 12, 18, 18, 24, 24, 30,
30, 36, 36, 42, 42, 48, 48, 54, 54, 60, 60, 66, 66, 72, 72, 78,
78, 84, 84, 90, 90, 96, 96, 102, 102, 108, 108, 114, 114, 120,
120, 126, 126, 132, 132, 138, 138), confluence = c(14.68764,
19.73559, 2.897458, 3.478664, 3.46789, 4.122939, 4.270285, 4.534702,
4.838222, 5.578382, 5.938678, 6.337464, 7.116287, 7.824044, 8.50258,
10.16758, 11.13803, 13.25756, 18.46681, 11.97336, 24.45211, 14.61754,
30.7178, 19.91414, 37.93423, 26.0687, 45.91022, 33.69255, 57.83714,
42.13477, 69.2417, 54.8134, 79.81015, 68.28696, 89.50358, 78.21476,
95.31271, 87.13279, 97.71458, 94.69752, 98.59245, 97.71144, 98.8707,
98.87447, 98.99731, 99.42957, 99.02805, 99.6716)), row.names = c(NA,
-48L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse )
smooth <- data.frame(supsmu(data$time, data$confluence))
data <- full_join(data, smooth, by= c("time" = "x"))
ggplot(data = data) +
geom_point(aes(x = time, y = confluence)) +
geom_smooth(aes(x = time, y = confluence)) +
geom_point(aes(x = time, y = y), color = "red")
head(data, 10)
# # A tibble: 10 x 3
# time confluence y
# <dbl> <dbl> <dbl>
# 1 0 14.7 14.7
# 2 0 19.7 14.7
# 3 6 2.90 8.72
# 4 6 3.48 8.72
# 5 12 3.47 5.10
# 6 12 4.12 5.10
# 7 18 4.27 4.49
# 8 18 4.53 4.49
# 9 24 4.84 5.30
# 10 24 5.58 5.30