如何根据 R 中的多个条件返回 true 或 false?
How to give back true or false based on multiple criteria in R?
我有一个 data.table,其中包含每个人的进入和退出日期以及指示退出原因的文本列。我的数据如下所示:
dt <- data.table (ID = c(1,2,3,4,5),
entry = c("01/01/2010", "01/02/2016", "01/05/2010", "01/09/2013", "01/01/2010"),
exit = c("31/12/2010", "01/01/2021", "30/09/2010", "31/12/2015", "30/09/2010"),
text = c("a", NA, "c", NA, "b"),
result_2010 = c(NA, NA, NA, NA,NA))
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a NA
2: 2 01/02/2016 01/01/2021 <NA> NA
3: 3 01/05/2010 30/09/2010 c NA
4: 4 01/09/2013 31/12/2015 <NA> NA
5: 5 01/01/2010 30/09/2010 b NA
在“result_2010”栏中,我想确定此人是否在 2010 年 1 月 1 日至 2010 年 12 月 31 日之间离开公司,但前提是在“文本”栏中此人有“a " 或 "c"。否则结果应该 return "false".
结果应如下所示:
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a TRUE
2: 2 01/02/2016 01/01/2021 <NA> FALSE
3: 3 01/05/2010 30/09/2010 c TRUE
4: 4 01/09/2013 31/12/2015 <NA> FALSE
5: 5 01/01/2010 30/09/2010 b FALSE
有人知道我该怎么做吗?
我们可以将列转换为Date
class,并根据OP的post
中的条件创建一个逻辑列
library(dplyr)
library(lubridate)
dt %>%
mutate(across(c(entry, exit), dmy)) %>%
mutate(result_2010 = entry >= as.Date('2010-01-01') &
exit <= as.Date("2010-12-31") & text %in% c("a", "c"))
-输出
ID entry exit text result_2010
1: 1 2010-01-01 2010-12-31 a TRUE
2: 2 2016-02-01 2021-01-01 <NA> FALSE
3: 3 2010-05-01 2010-09-30 c TRUE
4: 4 2013-09-01 2015-12-31 <NA> FALSE
5: 5 2010-01-01 2010-09-30 b FALSE
data.table
dt[, c("entry","exit") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("entry","exit")]
dt[, result_2010 := text %in% c("a", "c") & between(exit, as.Date("2010-01-01"), as.Date("2010-12-31"))]
# ID entry exit text result_2010
# <num> <Date> <Date> <char> <lgcl>
# 1: 1 2010-01-01 2010-12-31 a TRUE
# 2: 2 2016-02-01 2021-01-01 <NA> FALSE
# 3: 3 2010-05-01 2010-09-30 c TRUE
# 4: 4 2013-09-01 2015-12-31 <NA> FALSE
# 5: 5 2010-01-01 2010-09-30 b FALSE
(实际上是 的 data.table
版本,两者都受益于 data.table::between
或 dplyr::between
的可读性。)
我有一个 data.table,其中包含每个人的进入和退出日期以及指示退出原因的文本列。我的数据如下所示:
dt <- data.table (ID = c(1,2,3,4,5),
entry = c("01/01/2010", "01/02/2016", "01/05/2010", "01/09/2013", "01/01/2010"),
exit = c("31/12/2010", "01/01/2021", "30/09/2010", "31/12/2015", "30/09/2010"),
text = c("a", NA, "c", NA, "b"),
result_2010 = c(NA, NA, NA, NA,NA))
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a NA
2: 2 01/02/2016 01/01/2021 <NA> NA
3: 3 01/05/2010 30/09/2010 c NA
4: 4 01/09/2013 31/12/2015 <NA> NA
5: 5 01/01/2010 30/09/2010 b NA
在“result_2010”栏中,我想确定此人是否在 2010 年 1 月 1 日至 2010 年 12 月 31 日之间离开公司,但前提是在“文本”栏中此人有“a " 或 "c"。否则结果应该 return "false".
结果应如下所示:
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a TRUE
2: 2 01/02/2016 01/01/2021 <NA> FALSE
3: 3 01/05/2010 30/09/2010 c TRUE
4: 4 01/09/2013 31/12/2015 <NA> FALSE
5: 5 01/01/2010 30/09/2010 b FALSE
有人知道我该怎么做吗?
我们可以将列转换为Date
class,并根据OP的post
library(dplyr)
library(lubridate)
dt %>%
mutate(across(c(entry, exit), dmy)) %>%
mutate(result_2010 = entry >= as.Date('2010-01-01') &
exit <= as.Date("2010-12-31") & text %in% c("a", "c"))
-输出
ID entry exit text result_2010
1: 1 2010-01-01 2010-12-31 a TRUE
2: 2 2016-02-01 2021-01-01 <NA> FALSE
3: 3 2010-05-01 2010-09-30 c TRUE
4: 4 2013-09-01 2015-12-31 <NA> FALSE
5: 5 2010-01-01 2010-09-30 b FALSE
data.table
dt[, c("entry","exit") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("entry","exit")]
dt[, result_2010 := text %in% c("a", "c") & between(exit, as.Date("2010-01-01"), as.Date("2010-12-31"))]
# ID entry exit text result_2010
# <num> <Date> <Date> <char> <lgcl>
# 1: 1 2010-01-01 2010-12-31 a TRUE
# 2: 2 2016-02-01 2021-01-01 <NA> FALSE
# 3: 3 2010-05-01 2010-09-30 c TRUE
# 4: 4 2013-09-01 2015-12-31 <NA> FALSE
# 5: 5 2010-01-01 2010-09-30 b FALSE
(实际上是 data.table
版本,两者都受益于 data.table::between
或 dplyr::between
的可读性。)