平滑曲线上的最大曲线变化率

Maximum rate of curve change on a smoothed curve

我有一个时间序列数据集,我正在尝试计算最大变化率以估计 NDVI 数据的绿化日期。我的数据如下:

    date       NDVI
 1 2021-01-01 0.307
 2 2021-01-17 0.291
 3 2021-02-02 0.287
 4 2021-02-18 0.247
 5 2021-03-06 0.286
 6 2021-03-22 0.268
 7 2021-04-07 0.299
 8 2021-04-23 0.295
 9 2021-05-09 0.349
10 2021-05-25 0.402
11 2021-06-10 0.359
12 2021-06-26 0.432
13 2021-07-12 0.564
14 2021-07-28 0.654
15 2021-08-13 0.699
16 2021-08-29 0.614
17 2021-09-14 0.588
18 2021-09-30 0.553
19 2021-10-16 0.450
20 2021-11-01 0.377
21 2021-11-17 0.341
22 2021-12-03 0.331
23 2021-12-19 0.323

#I plot my dataset and fit a curve

p1 <- ggplot(data,aes(x = date, y = NDVI)) + stat_smooth(method = "lm", formula = y ~ ns(x,3), color="blue") + geom_point() 

p1

我现在希望能够计算曲线变化的最大速率,以确定植被何时开始变绿(我根据该图猜测是在 5 月的某个时候)。

感谢您的帮助。

一种可能的方法是构建 ggplot,提取样条曲线,然后计算其控制点:

library(dplyr); library(ggplot2)
a <- mtcars %>%
  ggplot(aes(drat, mpg)) +
  stat_smooth(method = "lm", formula = y ~ splines::ns(x,3), color="blue") +
  geom_point()

b <- ggplot_build(a)
spline_x <- b[["data"]][[1]][["x"]]
spline_y <- b[["data"]][[1]][["y"]]
change_rate <- c(NA, diff(spline_y)/ diff(spline_x))

a +
  geom_line(data = data.frame(spline_x, change_rate),
            aes(spline_x, change_rate), color = "red")

然后我们可以提取最大值 change_rate:

data.frame(spline_x, change_rate) %>%
  filter(change_rate == max(change_rate, na.rm = TRUE))

  spline_x change_rate
1 3.748861    10.68438

此模型的最大增长率出现在 2021 年 6 月 11 日 9:13pm。获得此结果的一种简单方法是在 ggplot 之外创建模型:

library(splines)

mod <- lm(NDVI ~ ns(date, 3), data = data)

现在用它来预测一年中每一天的 NDVI:

dates <- seq(as.Date("2021-01-01"), as.Date("2021-12-31"), by = "day")

predictions <- predict(mod, newdata = list(date = dates))

最大变化率出现在连续几天的预测之间的最大差异处:

dates[which.max(diff(predictions))]
# [1] "2021-06-11"

如果您需要精确到小时和分钟(根据数据点的数量这是不合理的),您可以这样做:

data$date <- as.POSIXct(data$date)
mod <- lm(NDVI ~ ns(date, 3), data = data)

dates <- seq(as.POSIXct("2021-01-01"), as.POSIXct("2021-12-31"), by = "min")

predictions <- predict(mod, newdata = list(date = dates))

dates[which.max(diff(predictions))]
#> [1] "2021-06-11 21:13:00 GMT"

Savitsky-Golay 过滤器(此处使用 signal::sgolayfilt())将平滑您的数据,并且可以选择 return m-th 导数。因此,您可以 运行 获得一阶导数,然后提取最大值的日期。您可以调整过滤器的 length 以确定您想要的平滑度与锯齿度。

library(tidyverse)
library(signal)

d <- structure(list(date = structure(c(18628L, 18644L, 18660L, 18676L, 18692L, 18708L, 18724L, 18740L, 18756L, 18772L, 18788L, 18804L, 18820L, 18836L, 18852L, 18868L, 18884L, 18900L, 18916L, 18932L, 18948L, 18964L, 18980L), class = c("IDate", "Date")), NDVI = c(0.307, 0.291, 0.287, 0.247, 0.286, 0.268, 0.299, 0.295, 0.349, 0.402, 0.359, 0.432, 0.564, 0.654, 0.699, 0.614, 0.588, 0.553, 0.45, 0.377, 0.341, 0.331, 0.323)), row.names = c(NA, -23L), class = "data.frame") 

# add sg smooth and derivative
d <- d %>% 
  mutate(date = as.Date(date)) %>% 
  mutate(sg = sgolayfilt(NDVI, n = 9)) %>% 
  mutate(dydx = sgolayfilt(NDVI, n = 9, m = 1))

# plot original data with smooth and derivative and annotate date of max slope
d %>% 
  ggplot(aes(date)) +
  geom_point(aes(y = NDVI)) +
  geom_line(aes(y = sg), color = "blue", size = 1) + 
  geom_line(aes(y = dydx), color = "red", size = 1) +
  geom_vline(aes(xintercept = date[which.max(dydx)]), linetype = "dashed", size = 1.5) +
  annotate("text", x = min(d$date), y = max(d$NDVI), hjust = 0,
           label = paste0("Max rate of change on: ", d$date[which.max(d$dydx)]))

reprex package (v2.0.1)

于 2022-04-01 创建