计算 R 中具有两个特定条件的事件

Count events with two specific conditions in R

我需要计算具有两个特定条件的事件并按年份汇总。我的数据示例如下:

year <- c(rep(1981,20))
k1 <- c(rep(NA,5),rep("COLD",4),rep(NA,4),"COLD",NA,"COLD",rep(NA,4))
k2 <- c(rep(NA,10),rep("COLD",2),rep(NA,8))
k3 <- c(rep(NA,3),"COLD",rep(NA,16))
k4 <- c(rep(NA,3),rep("COLD",5),rep(NA,2),rep("COLD",5),NA,rep("COLD",4))
k5 <- c(rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,8))

df <- data.frame(year,k1,k2,k3,k4,k5)

我使用 rle,我发现它很容易应用。我的代码能够计算具有 5 个连续记录“COLD”的事件数,并且每年分别进行。但是这里我需要添加另一个条件,即两个独立事件之间(即5个或更多“COLD”)应该至少有3个记录“NA”(或三个间隙),如果小于3个“NA”,则为同一事件。我的代码:

rle_col = function(k_col, num = 5){
    k_col[is.na(k_col)] = "NA" # convert NAs
    r = rle(k_col) # run length encoding
    which_cold = r$values == "COLD"
    sum(r$lengths[which_cold] >= num)
}

result <- aggregate(df[2:6],by = list(df$year), rle_col)

我尝试了下面的代码,但不幸的是,它没有像我预期的那样工作...有什么建议吗?谢谢!

rle_col = function(k_col, num = 5, numm = 3){
    k_col[is.na(k_col)] = "NA" # convert NAs
    r = rle(k_col) # run length encoding
    which_cold = r$values == "COLD"
    which_gap = r$values == "NA"
    sum(r$lengths[which_cold] >= num & r$lengths[which_gap] >= numm)

我想要的结果应该是这样的:

     year    k1    k2    k3    k4    k5
     <dbl> <int> <int> <int> <int> <int>
     1981     0     0     0     1     0

我们可以用tidyverse

library(dplyr)
df %>% 
    group_by(year) %>% 
    summarise(across(starts_with('k'), rle_col))
# A tibble: 1 × 6
   year    k1    k2    k3    k4    k5
  <dbl> <int> <int> <int> <int> <int>
1  1981     0     0     0     1     0

其中 rle_col

rle_col <-  function(k_col, num = 5) {

    with(rle(is.na(k_col)), {
           i1 <- values
            i1[values & lengths <3] <- 'Invalid'
            sum(!values & lengths >= 5 & 
        (lag(i1) != "Invalid"|lead(i1) != "Invalid"), na.rm = TRUE)

             })
 }