获取序列中超过阈值的第一个超出日期

Question

我有一个包含三列的 csv 文件。第一列是候历（一年 73 候），第二列和第三列是降水量。

我想做的事情：

[1]。在"at least three consecutive pentads".

降水量超过"annual mean"时得到第一个候候

我可以像这样对第一列进行子集化：

dat<-read.csv("test.csv",header=T,sep=",")
aa<-which(dat$RR>mean(dat$RR))

这给了我以下信息：

[1] 27 28 29 30 31 34 36 37 38 41 42 43 44 45 46 52 53 54 55 56 57

在这种情况下正确的输出应该是P27。

第二栏：

[1] 31 32 36 38 39 40 41 42 43 44 45 46 47 48 49 50 53 54 55 57 59 60 61

正确的输出应该是P38。

考虑到 "three consecutive pentads"，如何在此处添加条件语句？

我不知道如何在 R 中（在代码中）实现它。我将不胜感激任何建议。

我有以下数据：

Pentad  RR  YY
1   0   0.5771428571
2   0.0142857143    0
3   0   1.2828571429
4   0.0885714286    1.4457142857
5   0.0714285714    0.1114285714
6   0   0.36
7   0.0657142857    0
8   0.0285714286    0
9   0.0942857143    0
10  0.0114285714    1
11  0   0.0114285714
12  0   0.0085714286
13  0   0.3057142857
14  0   0
15  0   0
16  0   0
17  0.04    0
18  0   0.8
19  0.8142857143    0.0628571429
20  0.2857142857    0
21  1.14    0
22  5.3342857143    0
23  2.3514285714    0
24  1.9857142857    0.0133333333
25  1.4942857143    0.0433333333
26  2.0057142857    1.4866666667
27  20.0485714286   0
28  25.0085714286   2.4866666667
29  16.32   1.9433333333
30  11.0685714286   0.7733333333
31  8.9657142857    8.1066666667
32  3.9857142857    7.7333333333
33  5.2028571429    0.5
34  7.8028571429    4.3566666667
35  4.4514285714    2.66
36  9.22    6.6266666667
37  32.0485714286   4.4042857143
38  19.5057142857   7.9771428571
39  3.1485714286    12.9428571429
40  2.4342857143    18.4942857143
41  9.0571428571    7.3571428571
42  28.7085714286   11.0828571429
43  34.1514285714   9.0342857143
44  33.0257142857   14.2914285714
45  46.5057142857   34.6142857143
46  70.6171428571   45.3028571429
47  3.1685714286    6.66
48  1.9285714286    6.7028571429
49  7.0314285714    5.9628571429
50  0.9028571429    14.8542857143
51  5.3771428571    2.1
52  11.3571428571   2.8371428571
53  15.0457142857   7.3914285714
54  11.6628571429   32.0371428571
55  21.24   9.0057142857
56  11.4371428571   3.5257142857
57  11.6942857143   12.32
58  2.9771428571    2.32
59  4.3371428571    7.9942857143
60  0.8714285714    6.5657142857
61  1.3914285714    4.7714285714
62  0.8714285714    2.3542857143
63  1.1457142857    0.0057142857
64  2.3171428571    2.5085714286
65  0.1828571429    0.8171428571
66  0.2828571429    2.8857142857
67  0.3485714286    0.8971428571
68  0   0
69  0.3457142857    0
70  0.1428571429    0
71  0.18    0
72  4.8942857143    0.1457142857
73  0.0371428571    0.4342857143

Answer 1

应该这样做：

first_exceed_seq <- function(x, thresh = mean(x), len = 3)
{
  # Logical vector, does x exceed the threshold
  exceed_thresh <- x > thresh

  # Indices of transition points; where exceed_thresh[i - 1] != exceed_thresh[i]
  transition <- which(diff(c(0, exceed_thresh)) != 0)

  # Reference index, grouping observations after each transition
  index <- vector("numeric", length(x))
  index[transition] <- 1
  index <- cumsum(index)

  # Break x into groups following the transitions
  exceed_list <- split(exceed_thresh, index)

  # Get the number of values exceeded in each index period
  num_exceed <- vapply(exceed_list, sum, numeric(1))

  # Get the starting index of the first sequence where more then len exceed thresh
  transition[as.numeric(names(which(num_exceed >= len))[1])]
}

first_exceed_seq(dat$RR)
first_exceed_seq(dat$YY)

获取序列中超过阈值的第一个超出日期

Getting the first exceedance date over a threshold in a sequence

conditional

r