在 R 中,如何从更大的数据集创建单独的时间序列(循环并 运行 Mann-Kendall 测试)?
In R, how do I create individual time series (to loop trough and run a Mann-Kendall test) from a larger dataset?
我开始自动化我们用于水质数据时间序列数据的 Mann-Kendall 测试。之前在Excel做过,但是复制粘贴太多了。我想创建一个 R 脚本来 运行 MK 测试(包“趋势”)并计算 Sens 斜率。我有以下示例 table 我们的水质数据:
| Site | Program | Year | parameter1 | parameter2 |
|------|---------|------|-----|-----|
| A | ABC | 1990 | 5 | 100 |
| A | ABC | 1991 | 10 | 75 |
| A | ABC | 1992 | 15 | 50 |
| A | ABC | 1993 | 20 | 25 |
| A | ABC | 1994 | 25 | 5 |
| B | ABC | 1990 | 10 | 88 |
| B | ABC | 1991 | 20 | 44 |
| B | ABC | 1992 | 30 | 22 |
| B | ABC | 1993 | 40 | 11 |
| B | ABC | 1994 | 50 | 6 |
| C | XYZ | 1990 | 6 | 64 |
| C | XYZ | 1991 | 12 | 44 |
| C | XYZ | 1992 | 18 | 24 |
| C | XYZ | 1993 | 24 | 14 |
| C | XYZ | 1994 | 30 | 4 |
| D | XYZ | 1990 | 7 | 99 |
| D | XYZ | 1991 | 14 | 88 |
| D | XYZ | 1992 | 21 | 77 |
| D | XYZ | 1993 | 28 | 66 |
| D | XYZ | 1994 | 35 | 55 |
我需要为每个参数(ANC 和 SO4)取出数据中的每个时间序列(因此对于站点 A、B、C、D)和 运行R 中的 MannKendall 测试(代码如下).我需要一个输出 table 如下所示,但填充了 MK 统计数据和 sens 斜率(不是如下所示的 1)。
| Site | Program | Parameter | MK Statistic | Sens Slope |
|------|---------|-----------|--------------|------------|
| A | ABC | ANC | 1 | 1 |
| A | ABC | SO4 | 1 | 1 |
| B | ABC | ANC | 1 | 1 |
| B | ABC | SO4 | 1 | 1 |
| C | XYZ | ANC | 1 | 1 |
| C | XYZ | SO4 | 1 | 1 |
| D | XYZ | ANC | 1 | 1 |
| D | XYZ | SO4 | 1 | 1 |
知道如何生成此输出 table 吗?我知道它在某些时候需要一个循环,但不完全确定从哪里开始。可能针对每个站点、程序,然后是 ANC 或 S04。下面的 R 代码来自一个单独的站点和参数组合,但是对于我们拥有的 100 个站点和 6 个水质参数来说,这将是一个痛苦的复制过程。
install.packages("trend")
library("trend")
#put our data in a time series (but this only creates 1 site and its time series)
time_series <- ts(Trends$parameter1, start=c(1990, 1), end=c(1994, 1), frequency=1)
print(time_series)
#Run the MK Test and Sens Slope from package trend
mk.test(time_series, alternative = c("two.sided", "greater", "less"),
continuity = TRUE)
sens.slope(time_series, conf.level = 0.95)
输出示例 - 这些是我的实际数据的结果,而不是示例数据集(因为我没有在示例数据的所有站点上成功 运行 MKtest)。下面带有 ^^^^ 的数字是我最终输出需要的数字 table.
> mk.test(time_series , alternative = c("two.sided", "greater", "less"),
+ continuity = TRUE)
Mann-Kendall trend test
data: time_series
z = -5.7308, n = 26, p-value = 9.996e-09
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
-261.0000000 2058.3333333 -0.8030769
^^^^^^^^^^^^
> sens.slope(time_series , conf.level = 0.95)
Sens slope
data: time_series
z = -5.7308, n = 26, p-value = 9.996e-09
alternative hypothesis: true z is not equal to 0
95 percent confidence interval:
-1.3187075 -0.9495238
sample estimates:
Sens slope
-1.136842
^^^^^^^^^
我们可以 split
将 'Site' 的数据集转换为 data.frames
的 list
并应用测试
library(trend)
lst1 <- split(Trends[c("parameter1", "parameter2")], Trends$Site)
out <- lapply(lst1, function(dat)
lapply(dat, function(para) {
time_series <- ts(para, start=c(1983, 1), end=c(2018, 1), frequency=1)
tsout <- mk.test(time_series, alternative = c("two.sided",
"greater", "less"), continuity = TRUE)
sensout <- sens.slope(time_series, conf.level = 0.95)
list(tsout = tsout, sensout = sensout)
}
))
-输出
out$A$parameter1
#$tsout
# Mann-Kendall trend test
#data: time_series
#z = 0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true S is not equal to 0
#sample estimates:
# S varS tau
#4.200000e+01 5.147333e+03 7.352146e-02
#$sensout
# Sen's slope
#data: time_series
#z = 0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true z is not equal to 0
#95 percent confidence interval:
# 0.0000 0.3125
#sample estimates:
#Sen's slope
# 0
out$D$parameter2
#$tsout
# Mann-Kendall trend test
#data: time_series
#z = -0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true S is not equal to 0
#sample estimates:
# S varS tau
# -42.00000000 5147.33333333 -0.07352146
#$sensout
# Sen's slope
#data: time_series
#z = -0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true z is not equal to 0
#95 percent confidence interval:
# -0.6875 0.0000
#sample estimates:
#Sen's slope
# 0
数据
Trends <- structure(list(Site = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D"),
Program = c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC",
"ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ",
"XYZ", "XYZ", "XYZ", "XYZ"), Year = c(1990L, 1991L, 1992L,
1993L, 1994L, 1990L, 1991L, 1992L, 1993L, 1994L, 1990L, 1991L,
1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L, 1994L),
parameter1 = c(5L, 10L, 15L, 20L, 25L, 10L, 20L, 30L, 40L,
50L, 6L, 12L, 18L, 24L, 30L, 7L, 14L, 21L, 28L, 35L), parameter2 = c(100L,
75L, 50L, 25L, 5L, 88L, 44L, 22L, 11L, 6L, 64L, 44L, 24L,
14L, 4L, 99L, 88L, 77L, 66L, 55L)), class = "data.frame", row.names = c(NA,
-20L))
我有一个使用 tidyverse 的不同方法。到目前为止,这可能有点超出您的经验,但我建议您检查一下,因为我发现当您来自 excel
时,它比 base R 更容易使用
library(dplyr)
library(ggplot2)
library(tidyr)
library(purrr)
install.packages("trend")
library("trend")
out <- dat %>% gather(parameter, value, ANC, SO4) %>%
group_by(parameter, Site) %>% nest() %>%
mutate(ts_out = map(data, ~ts(.x$value, start=c(1990, 1), end=c(1994, 1), frequency=1))) %>%
mutate(mk_res = map(ts_out, ~mk.test(.x, alternative = c("two.sided", "greater", "less"),
continuity = TRUE)),
sens = map(ts_out, ~sens.slope(.x, conf.level = 0.95))) %>%
mutate(mk_stat = map_dbl(mk_res, ~.x$p.value),
sens_stat = map_dbl(sens, ~.x$p.value)) %>%
select(parameter, Site, mk_stat, sens_stat)
out
# A tibble: 8 x 4
parameter Site mk_stat sens_stat
<chr> <fct> <dbl> <dbl>
1 ANC " A " 0.0275 0.0275
2 ANC " B " 0.0275 0.0275
3 ANC " C " 0.0275 0.0275
4 ANC " D " 0.0275 0.0275
5 SO4 " A " 0.0275 0.0275
6 SO4 " B " 0.0275 0.0275
7 SO4 " C " 0.0275 0.0275
8 SO4 " D " 0.0275 0.0275
这会在 table 中给出输出。我不确定这是否是您要从测试中删除的部分,但应该更容易更改
我建议查看每一步的输出以了解结构。这种分析风格的一个很好的资源是 R for Data Science Many Models Chapter
我开始自动化我们用于水质数据时间序列数据的 Mann-Kendall 测试。之前在Excel做过,但是复制粘贴太多了。我想创建一个 R 脚本来 运行 MK 测试(包“趋势”)并计算 Sens 斜率。我有以下示例 table 我们的水质数据:
| Site | Program | Year | parameter1 | parameter2 |
|------|---------|------|-----|-----|
| A | ABC | 1990 | 5 | 100 |
| A | ABC | 1991 | 10 | 75 |
| A | ABC | 1992 | 15 | 50 |
| A | ABC | 1993 | 20 | 25 |
| A | ABC | 1994 | 25 | 5 |
| B | ABC | 1990 | 10 | 88 |
| B | ABC | 1991 | 20 | 44 |
| B | ABC | 1992 | 30 | 22 |
| B | ABC | 1993 | 40 | 11 |
| B | ABC | 1994 | 50 | 6 |
| C | XYZ | 1990 | 6 | 64 |
| C | XYZ | 1991 | 12 | 44 |
| C | XYZ | 1992 | 18 | 24 |
| C | XYZ | 1993 | 24 | 14 |
| C | XYZ | 1994 | 30 | 4 |
| D | XYZ | 1990 | 7 | 99 |
| D | XYZ | 1991 | 14 | 88 |
| D | XYZ | 1992 | 21 | 77 |
| D | XYZ | 1993 | 28 | 66 |
| D | XYZ | 1994 | 35 | 55 |
我需要为每个参数(ANC 和 SO4)取出数据中的每个时间序列(因此对于站点 A、B、C、D)和 运行R 中的 MannKendall 测试(代码如下).我需要一个输出 table 如下所示,但填充了 MK 统计数据和 sens 斜率(不是如下所示的 1)。
| Site | Program | Parameter | MK Statistic | Sens Slope |
|------|---------|-----------|--------------|------------|
| A | ABC | ANC | 1 | 1 |
| A | ABC | SO4 | 1 | 1 |
| B | ABC | ANC | 1 | 1 |
| B | ABC | SO4 | 1 | 1 |
| C | XYZ | ANC | 1 | 1 |
| C | XYZ | SO4 | 1 | 1 |
| D | XYZ | ANC | 1 | 1 |
| D | XYZ | SO4 | 1 | 1 |
知道如何生成此输出 table 吗?我知道它在某些时候需要一个循环,但不完全确定从哪里开始。可能针对每个站点、程序,然后是 ANC 或 S04。下面的 R 代码来自一个单独的站点和参数组合,但是对于我们拥有的 100 个站点和 6 个水质参数来说,这将是一个痛苦的复制过程。
install.packages("trend")
library("trend")
#put our data in a time series (but this only creates 1 site and its time series)
time_series <- ts(Trends$parameter1, start=c(1990, 1), end=c(1994, 1), frequency=1)
print(time_series)
#Run the MK Test and Sens Slope from package trend
mk.test(time_series, alternative = c("two.sided", "greater", "less"),
continuity = TRUE)
sens.slope(time_series, conf.level = 0.95)
输出示例 - 这些是我的实际数据的结果,而不是示例数据集(因为我没有在示例数据的所有站点上成功 运行 MKtest)。下面带有 ^^^^ 的数字是我最终输出需要的数字 table.
> mk.test(time_series , alternative = c("two.sided", "greater", "less"),
+ continuity = TRUE)
Mann-Kendall trend test
data: time_series
z = -5.7308, n = 26, p-value = 9.996e-09
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
-261.0000000 2058.3333333 -0.8030769
^^^^^^^^^^^^
> sens.slope(time_series , conf.level = 0.95)
Sens slope
data: time_series
z = -5.7308, n = 26, p-value = 9.996e-09
alternative hypothesis: true z is not equal to 0
95 percent confidence interval:
-1.3187075 -0.9495238
sample estimates:
Sens slope
-1.136842
^^^^^^^^^
我们可以 split
将 'Site' 的数据集转换为 data.frames
的 list
并应用测试
library(trend)
lst1 <- split(Trends[c("parameter1", "parameter2")], Trends$Site)
out <- lapply(lst1, function(dat)
lapply(dat, function(para) {
time_series <- ts(para, start=c(1983, 1), end=c(2018, 1), frequency=1)
tsout <- mk.test(time_series, alternative = c("two.sided",
"greater", "less"), continuity = TRUE)
sensout <- sens.slope(time_series, conf.level = 0.95)
list(tsout = tsout, sensout = sensout)
}
))
-输出
out$A$parameter1
#$tsout
# Mann-Kendall trend test
#data: time_series
#z = 0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true S is not equal to 0
#sample estimates:
# S varS tau
#4.200000e+01 5.147333e+03 7.352146e-02
#$sensout
# Sen's slope
#data: time_series
#z = 0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true z is not equal to 0
#95 percent confidence interval:
# 0.0000 0.3125
#sample estimates:
#Sen's slope
# 0
out$D$parameter2
#$tsout
# Mann-Kendall trend test
#data: time_series
#z = -0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true S is not equal to 0
#sample estimates:
# S varS tau
# -42.00000000 5147.33333333 -0.07352146
#$sensout
# Sen's slope
#data: time_series
#z = -0.57147, n = 36, p-value = 0.5677
#alternative hypothesis: true z is not equal to 0
#95 percent confidence interval:
# -0.6875 0.0000
#sample estimates:
#Sen's slope
# 0
数据
Trends <- structure(list(Site = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D"),
Program = c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC",
"ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ",
"XYZ", "XYZ", "XYZ", "XYZ"), Year = c(1990L, 1991L, 1992L,
1993L, 1994L, 1990L, 1991L, 1992L, 1993L, 1994L, 1990L, 1991L,
1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L, 1994L),
parameter1 = c(5L, 10L, 15L, 20L, 25L, 10L, 20L, 30L, 40L,
50L, 6L, 12L, 18L, 24L, 30L, 7L, 14L, 21L, 28L, 35L), parameter2 = c(100L,
75L, 50L, 25L, 5L, 88L, 44L, 22L, 11L, 6L, 64L, 44L, 24L,
14L, 4L, 99L, 88L, 77L, 66L, 55L)), class = "data.frame", row.names = c(NA,
-20L))
我有一个使用 tidyverse 的不同方法。到目前为止,这可能有点超出您的经验,但我建议您检查一下,因为我发现当您来自 excel
时,它比 base R 更容易使用library(dplyr)
library(ggplot2)
library(tidyr)
library(purrr)
install.packages("trend")
library("trend")
out <- dat %>% gather(parameter, value, ANC, SO4) %>%
group_by(parameter, Site) %>% nest() %>%
mutate(ts_out = map(data, ~ts(.x$value, start=c(1990, 1), end=c(1994, 1), frequency=1))) %>%
mutate(mk_res = map(ts_out, ~mk.test(.x, alternative = c("two.sided", "greater", "less"),
continuity = TRUE)),
sens = map(ts_out, ~sens.slope(.x, conf.level = 0.95))) %>%
mutate(mk_stat = map_dbl(mk_res, ~.x$p.value),
sens_stat = map_dbl(sens, ~.x$p.value)) %>%
select(parameter, Site, mk_stat, sens_stat)
out
# A tibble: 8 x 4
parameter Site mk_stat sens_stat
<chr> <fct> <dbl> <dbl>
1 ANC " A " 0.0275 0.0275
2 ANC " B " 0.0275 0.0275
3 ANC " C " 0.0275 0.0275
4 ANC " D " 0.0275 0.0275
5 SO4 " A " 0.0275 0.0275
6 SO4 " B " 0.0275 0.0275
7 SO4 " C " 0.0275 0.0275
8 SO4 " D " 0.0275 0.0275
这会在 table 中给出输出。我不确定这是否是您要从测试中删除的部分,但应该更容易更改
我建议查看每一步的输出以了解结构。这种分析风格的一个很好的资源是 R for Data Science Many Models Chapter