如何在 R 中使用复杂的 if-else 条件将值设置为 NA?
How to set values as NA with complex if-else criteria in R?
我的 df 是这样的:
SERIAL quest time_d1_1 time_d1_2 time_d2_1 time_d2_2 STARTED V01
F3L d1_1 05:00 17:30 05:15 17:45 2022-01-08 05:06:19 5
F3L d1_2 05:00 17:30 05:15 17:45 2022-01-08 17:30:07 2
F3L d2_1 05:00 17:30 05:15 17:45 2022-01-08 8:36:54 1
F3L d2_2 05:00 17:30 05:15 17:45 2022-01-08 18:10:07 7
7HG d1_1 05:00 17:30 05:15 17:45 2022-01-08 05:33:15 4
7HG d1_2 05:00 17:30 05:15 17:45 2022-01-08 18:49:22 2
7HG d2_1 05:00 17:30 05:15 17:45 2022-01-08 07:33:15 2
7HG d2_2 05:00 17:30 05:15 17:45 2022-01-08 18:29:22 6
SERIAL
= 标识符
quest
= [天X] _ [测量]; "d1_1" = 第一天,第一次测量
time_d1_1
= 第一天的参考时间 (hh:mm),测量 1
- ...
time_d2_2
= 第二天的参考时间 (hh:mm),测量 2
STARTED
= 每次测量开始的日期和时间 (yyyy:mm:dd hh:mm:ss)
V01
= 属于每个任务的一些值
对于每一行,当变量STARTED
比参考晚一个多小时时,我想将变量V01
设置为NA
时间(time_d1_1
到 time_d2_2
),相对于 quest
变量。
示例:在第 3 行中,SERIAL=F3L
开始第二天,第一个测量 (quest=d2_1
) 在 8:36:54
。但是,参考时间(time_d2_1
)是05:15
。我现在将 V01=1
设置为 NA,因为 8:36:54
大于 05:15 + hour(1)
。
不幸的是,我有这种奇怪的格式,所以我很难用 mutate-ifelse() 或 mutate-case_when() 函数来解决这个问题。谁能帮忙,最好是 tidyverse 解决方案?
数据:
Dat <- structure(list(SERIAL = c("F3L","F3L","F3L","F3L","7HG","7HG","7HG","7HG"),
quest = c("d1_1","d1_2","d2_1","d2_2","d1_1","d1_2","d2_1","d2_2"),
time_d1_1 = c("05:00","05:00","05:00","05:00","05:30","05:30","05:30","05:30"),
time_d1_2 = c("17:30","17:30","17:30","17:30","18:10","18:10","18:10","18:10"),
time_d2_1 = c("05:15","05:15","05:15","05:15","05:30","05:30","05:30","05:30"),
time_d2_2 = c("17:45","17:45","17:45","17:45","18:00","18:00","18:00","18:00"),
STARTED = c("2022-01-08 05:06:19","2022-01-08 17:30:07","2022-01-09 8:36:54",
"2022-01-09 18:10:07","2021-09-04 05:33:15","2021-09-04 18:49:22",
"2021-09-05 07:33:15","2021-09-05 18:29:22"),
V01 = c(5,3,1,7,4,2,2,6)),
class = "data.frame",
row.names = c(NA, -8L))
我们可以用tidyr::pivot_longer
把table变成长格式,计算一下,再变成宽格式(用tidyr::pivot_wider
)。请参阅示例中的内联评论。我使用 lubridate
来解析 datetime 对象;这也可以使用 base R 来完成。
library(dplyr)
library(tidyr)
library(lubridate)
Dat %>%
pivot_longer(starts_with("time_"), names_prefix = "time_") %>% # turn into long form
filter(quest == name) %>% # keep only the record where the `time_dX_Y` column matches `quest`
mutate(in_time = value > strftime(as_datetime(STARTED) - hours(1), "%H:%M:%S", tz = "UTC"),) %>% # calculate whether was in time
pivot_wider(names_from = name, names_prefix = "time_", values_from = value) %>% # turn into wide form again
group_by(SERIAL) %>%
mutate(across(starts_with("time_"), function(x) first(x[!is.na(x)]))) # fill missings in `time_dX_Y` columns caused by the filter above
这给出了
SERIAL quest STARTED V01 in_time time_d1_1 time_d1_2 time_d2_1 time_d2_2
<chr> <chr> <chr> <dbl> <lgl> <chr> <chr> <chr> <chr>
1 F3L d1_1 2022-01-08 05:06:19 5 TRUE 05:00 17:30 05:15 17:45
2 F3L d1_2 2022-01-08 17:30:07 3 TRUE 05:00 17:30 05:15 17:45
3 F3L d2_1 2022-01-09 8:36:54 1 FALSE 05:00 17:30 05:15 17:45
4 F3L d2_2 2022-01-09 18:10:07 7 TRUE 05:00 17:30 05:15 17:45
5 7HG d1_1 2021-09-04 05:33:15 4 TRUE 05:30 18:10 05:30 18:00
6 7HG d1_2 2021-09-04 18:49:22 2 TRUE 05:30 18:10 05:30 18:00
7 7HG d2_1 2021-09-05 07:33:15 2 FALSE 05:30 18:10 05:30 18:00
8 7HG d2_2 2021-09-05 18:29:22 6 TRUE 05:30 18:10 05:30 18:00
我的 df 是这样的:
SERIAL quest time_d1_1 time_d1_2 time_d2_1 time_d2_2 STARTED V01
F3L d1_1 05:00 17:30 05:15 17:45 2022-01-08 05:06:19 5
F3L d1_2 05:00 17:30 05:15 17:45 2022-01-08 17:30:07 2
F3L d2_1 05:00 17:30 05:15 17:45 2022-01-08 8:36:54 1
F3L d2_2 05:00 17:30 05:15 17:45 2022-01-08 18:10:07 7
7HG d1_1 05:00 17:30 05:15 17:45 2022-01-08 05:33:15 4
7HG d1_2 05:00 17:30 05:15 17:45 2022-01-08 18:49:22 2
7HG d2_1 05:00 17:30 05:15 17:45 2022-01-08 07:33:15 2
7HG d2_2 05:00 17:30 05:15 17:45 2022-01-08 18:29:22 6
SERIAL
= 标识符quest
= [天X] _ [测量]; "d1_1" = 第一天,第一次测量time_d1_1
= 第一天的参考时间 (hh:mm),测量 1- ...
time_d2_2
= 第二天的参考时间 (hh:mm),测量 2STARTED
= 每次测量开始的日期和时间 (yyyy:mm:dd hh:mm:ss)V01
= 属于每个任务的一些值
对于每一行,当变量STARTED
比参考晚一个多小时时,我想将变量V01
设置为NA
时间(time_d1_1
到 time_d2_2
),相对于 quest
变量。
示例:在第 3 行中,SERIAL=F3L
开始第二天,第一个测量 (quest=d2_1
) 在 8:36:54
。但是,参考时间(time_d2_1
)是05:15
。我现在将 V01=1
设置为 NA,因为 8:36:54
大于 05:15 + hour(1)
。
不幸的是,我有这种奇怪的格式,所以我很难用 mutate-ifelse() 或 mutate-case_when() 函数来解决这个问题。谁能帮忙,最好是 tidyverse 解决方案?
数据:
Dat <- structure(list(SERIAL = c("F3L","F3L","F3L","F3L","7HG","7HG","7HG","7HG"),
quest = c("d1_1","d1_2","d2_1","d2_2","d1_1","d1_2","d2_1","d2_2"),
time_d1_1 = c("05:00","05:00","05:00","05:00","05:30","05:30","05:30","05:30"),
time_d1_2 = c("17:30","17:30","17:30","17:30","18:10","18:10","18:10","18:10"),
time_d2_1 = c("05:15","05:15","05:15","05:15","05:30","05:30","05:30","05:30"),
time_d2_2 = c("17:45","17:45","17:45","17:45","18:00","18:00","18:00","18:00"),
STARTED = c("2022-01-08 05:06:19","2022-01-08 17:30:07","2022-01-09 8:36:54",
"2022-01-09 18:10:07","2021-09-04 05:33:15","2021-09-04 18:49:22",
"2021-09-05 07:33:15","2021-09-05 18:29:22"),
V01 = c(5,3,1,7,4,2,2,6)),
class = "data.frame",
row.names = c(NA, -8L))
我们可以用tidyr::pivot_longer
把table变成长格式,计算一下,再变成宽格式(用tidyr::pivot_wider
)。请参阅示例中的内联评论。我使用 lubridate
来解析 datetime 对象;这也可以使用 base R 来完成。
library(dplyr)
library(tidyr)
library(lubridate)
Dat %>%
pivot_longer(starts_with("time_"), names_prefix = "time_") %>% # turn into long form
filter(quest == name) %>% # keep only the record where the `time_dX_Y` column matches `quest`
mutate(in_time = value > strftime(as_datetime(STARTED) - hours(1), "%H:%M:%S", tz = "UTC"),) %>% # calculate whether was in time
pivot_wider(names_from = name, names_prefix = "time_", values_from = value) %>% # turn into wide form again
group_by(SERIAL) %>%
mutate(across(starts_with("time_"), function(x) first(x[!is.na(x)]))) # fill missings in `time_dX_Y` columns caused by the filter above
这给出了
SERIAL quest STARTED V01 in_time time_d1_1 time_d1_2 time_d2_1 time_d2_2
<chr> <chr> <chr> <dbl> <lgl> <chr> <chr> <chr> <chr>
1 F3L d1_1 2022-01-08 05:06:19 5 TRUE 05:00 17:30 05:15 17:45
2 F3L d1_2 2022-01-08 17:30:07 3 TRUE 05:00 17:30 05:15 17:45
3 F3L d2_1 2022-01-09 8:36:54 1 FALSE 05:00 17:30 05:15 17:45
4 F3L d2_2 2022-01-09 18:10:07 7 TRUE 05:00 17:30 05:15 17:45
5 7HG d1_1 2021-09-04 05:33:15 4 TRUE 05:30 18:10 05:30 18:00
6 7HG d1_2 2021-09-04 18:49:22 2 TRUE 05:30 18:10 05:30 18:00
7 7HG d2_1 2021-09-05 07:33:15 2 FALSE 05:30 18:10 05:30 18:00
8 7HG d2_2 2021-09-05 18:29:22 6 TRUE 05:30 18:10 05:30 18:00