如何在 R 中使用复杂的 if-else 条件将值设置为 NA?

How to set values as NA with complex if-else criteria in R?

我的 df 是这样的:

SERIAL  quest  time_d1_1  time_d1_2  time_d2_1  time_d2_2  STARTED              V01
F3L     d1_1   05:00      17:30      05:15      17:45      2022-01-08 05:06:19  5
F3L     d1_2   05:00      17:30      05:15      17:45      2022-01-08 17:30:07  2
F3L     d2_1   05:00      17:30      05:15      17:45      2022-01-08 8:36:54   1
F3L     d2_2   05:00      17:30      05:15      17:45      2022-01-08 18:10:07  7
7HG     d1_1   05:00      17:30      05:15      17:45      2022-01-08 05:33:15  4
7HG     d1_2   05:00      17:30      05:15      17:45      2022-01-08 18:49:22  2
7HG     d2_1   05:00      17:30      05:15      17:45      2022-01-08 07:33:15  2
7HG     d2_2   05:00      17:30      05:15      17:45      2022-01-08 18:29:22  6
  1. SERIAL = 标识符
  2. quest = [天X] _ [测量]; "d1_1" = 第一天,第一次测量
  3. time_d1_1 = 第一天的参考时间 (hh:mm),测量 1
  4. ...
  5. time_d2_2 = 第二天的参考时间 (hh:mm),测量 2
  6. STARTED = 每次测量开始的日期和时间 (yyyy:mm:dd hh:mm:ss)
  7. V01 = 属于每个任务的一些值

对于每一行,当变量STARTED比参考晚一个多小时时,我想将变量V01设置为NA时间(time_d1_1time_d2_2),相对于 quest 变量。

示例:在第 3 行中,SERIAL=F3L 开始第二天,第一个测量 (quest=d2_1) 在 8:36:54。但是,参考时间(time_d2_1)是05:15。我现在将 V01=1 设置为 NA,因为 8:36:54 大于 05:15 + hour(1)

不幸的是,我有这种奇怪的格式,所以我很难用 mutate-ifelse() 或 mutate-case_when() 函数来解决这个问题。谁能帮忙,最好是 tidyverse 解决方案?

数据:

Dat <- structure(list(SERIAL = c("F3L","F3L","F3L","F3L","7HG","7HG","7HG","7HG"),
                      quest = c("d1_1","d1_2","d2_1","d2_2","d1_1","d1_2","d2_1","d2_2"),
                      time_d1_1 = c("05:00","05:00","05:00","05:00","05:30","05:30","05:30","05:30"),
                      time_d1_2 = c("17:30","17:30","17:30","17:30","18:10","18:10","18:10","18:10"),
                      time_d2_1 = c("05:15","05:15","05:15","05:15","05:30","05:30","05:30","05:30"),
                      time_d2_2 = c("17:45","17:45","17:45","17:45","18:00","18:00","18:00","18:00"),
                      STARTED = c("2022-01-08 05:06:19","2022-01-08 17:30:07","2022-01-09 8:36:54",
                                  "2022-01-09 18:10:07","2021-09-04 05:33:15","2021-09-04 18:49:22",
                                  "2021-09-05 07:33:15","2021-09-05 18:29:22"),
                      V01 = c(5,3,1,7,4,2,2,6)),
                 class = "data.frame",
                 row.names = c(NA, -8L))

我们可以用tidyr::pivot_longer把table变成长格式,计算一下,再变成宽格式(用tidyr::pivot_wider)。请参阅示例中的内联评论。我使用 lubridate 来解析 datetime 对象;这也可以使用 base R 来完成。

library(dplyr)
library(tidyr)
library(lubridate)

Dat %>% 
  pivot_longer(starts_with("time_"), names_prefix = "time_") %>% # turn into long form
  filter(quest == name) %>% # keep only the record where the `time_dX_Y` column matches `quest` 
  mutate(in_time = value > strftime(as_datetime(STARTED) - hours(1), "%H:%M:%S", tz = "UTC"),) %>% # calculate whether was in time
  pivot_wider(names_from = name, names_prefix = "time_", values_from = value) %>% # turn into wide form again
  group_by(SERIAL) %>%
  mutate(across(starts_with("time_"), function(x) first(x[!is.na(x)]))) # fill missings in `time_dX_Y` columns caused by the filter above

这给出了

  SERIAL quest STARTED               V01 in_time time_d1_1 time_d1_2 time_d2_1 time_d2_2
  <chr>  <chr> <chr>               <dbl> <lgl>   <chr>     <chr>     <chr>     <chr>    
1 F3L    d1_1  2022-01-08 05:06:19     5 TRUE    05:00     17:30     05:15     17:45    
2 F3L    d1_2  2022-01-08 17:30:07     3 TRUE    05:00     17:30     05:15     17:45    
3 F3L    d2_1  2022-01-09 8:36:54      1 FALSE   05:00     17:30     05:15     17:45    
4 F3L    d2_2  2022-01-09 18:10:07     7 TRUE    05:00     17:30     05:15     17:45    
5 7HG    d1_1  2021-09-04 05:33:15     4 TRUE    05:30     18:10     05:30     18:00    
6 7HG    d1_2  2021-09-04 18:49:22     2 TRUE    05:30     18:10     05:30     18:00    
7 7HG    d2_1  2021-09-05 07:33:15     2 FALSE   05:30     18:10     05:30     18:00    
8 7HG    d2_2  2021-09-05 18:29:22     6 TRUE    05:30     18:10     05:30     18:00