日期：尚未为此非数字和非字符类型实现 NABounds=TRUE

Question

我有这个数据框：

df1 <- structure(list(ID = c(1, 2, 2, 2, 3, 4, 5, 6, 6, 7, 8, 8, 9, 
10), dateA = structure(c(14974, 18628, 18628, 18628, 14882, 16800, 
14882, 17835, 17835, 16832, 16556, 16556, 15949, 16801), class = "Date"), 
dateB = structure(c(14610, 15340, 15706, 17501, 14730, NA, 
14700, 16191, 17106, 16801, 15810, 16436, 14655, 15431), class = "Date"), 
dateC = structure(c(18628, 15705, 17500, 18628, 18628, NA, 
18628, 17105, 18628, 18628, 16435, 16556, 15706, 18628), class = "Date")), row.names = c(NA, 
-14L), class = c("data.table", "data.frame"))

    ID      dateA      dateB      dateC
 1:  1 2010-12-31 2010-01-01 2021-01-01
 2:  2 2021-01-01 2012-01-01 2012-12-31
 3:  2 2021-01-01 2013-01-01 2017-11-30
 4:  2 2021-01-01 2017-12-01 2021-01-01
 5:  3 2010-09-30 2010-05-01 2021-01-01
 6:  4 2015-12-31       <NA>       <NA>
 7:  5 2010-09-30 2010-04-01 2021-01-01
 8:  6 2018-10-31 2014-05-01 2016-10-31
 9:  6 2018-10-31 2016-11-01 2021-01-01
10:  7 2016-02-01 2016-01-01 2021-01-01
11:  8 2015-05-01 2013-04-15 2014-12-31
12:  8 2015-05-01 2015-01-01 2015-05-01
13:  9 2013-09-01 2010-02-15 2013-01-01
14: 10 2016-01-01 2012-04-01 2021-01-01

我想检查dateA是否在dateB和dateC的区间内：我的代码：

library(dplyr)
df1 %>% 
  mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))

给出：

Error: Problem with `mutate()` column `match`.
i `match = ifelse(between(dateA, dateB, dateC), 1, 0)`.
x Not yet implemented NAbounds=TRUE for this non-numeric and non-character type

如果我删除包含 NA 的行，代码将起作用：

df1 %>% 
  slice(-6) %>% 
  mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))

我想知道，我可以离开 NA 行并执行我的代码吗？

Answer 1

关于 OP 使用的 between 存在混淆，因为输入对象是 data.table 并且使用的代码是 dplyr。因此，如果我们假设两个包都已加载，则每个包中都有一个 between 函数，并且根据最后加载的包，前一个包中的 between 将被屏蔽。如果使用 dplyr::between，它没有完全向量化，它被记录在 ?dplyr::between

left, right Boundary values (must be scalars).

df1 %>%
    rowwise %>% 
    mutate(match = +(dplyr::between(dateA, dateB, dateC))) %>%
    ungroup

-输出

# A tibble: 14 × 5
      ID dateA      dateB      dateC      match
   <dbl> <date>     <date>     <date>     <int>
 1     1 2010-12-31 2010-01-01 2021-01-01     1
 2     2 2021-01-01 2012-01-01 2012-12-31     0
 3     2 2021-01-01 2013-01-01 2017-11-30     0
 4     2 2021-01-01 2017-12-01 2021-01-01     1
 5     3 2010-09-30 2010-05-01 2021-01-01     1
 6     4 2015-12-31 NA         NA            NA
 7     5 2010-09-30 2010-04-01 2021-01-01     1
 8     6 2018-10-31 2014-05-01 2016-10-31     0
 9     6 2018-10-31 2016-11-01 2021-01-01     1
10     7 2016-02-01 2016-01-01 2021-01-01     1
11     8 2015-05-01 2013-04-15 2014-12-31     0
12     8 2015-05-01 2015-01-01 2015-05-01     1
13     9 2013-09-01 2010-02-15 2013-01-01     0
14    10 2016-01-01 2012-04-01 2021-01-01     1

然而，?data.table::between 并非如此（根据 OP 的 post 中显示的错误，似乎使用的 between 来自 data.table，

lower - Lower range bound. Either length 1 or same length as x.

upper - Upper range bound. Either length 1 or same length as x.

但是 class 可能是个问题，尽管它另有说明

x- Any orderable vector, i.e., those with relevant methods for <=, such as numeric, character, Date, etc. in case of between and a numeric vector in case of inrange.

从 Date class 转换为 integer/numeric 应该可以工作

df1 %>%
   mutate(match = +(data.table::between(as.numeric(dateA), 
       as.numeric(dateB), as.numeric(dateC))))

-输出

ID      dateA      dateB      dateC match
 1:  1 2010-12-31 2010-01-01 2021-01-01     1
 2:  2 2021-01-01 2012-01-01 2012-12-31     0
 3:  2 2021-01-01 2013-01-01 2017-11-30     0
 4:  2 2021-01-01 2017-12-01 2021-01-01     1
 5:  3 2010-09-30 2010-05-01 2021-01-01     1
 6:  4 2015-12-31       <NA>       <NA>     1
 7:  5 2010-09-30 2010-04-01 2021-01-01     1
 8:  6 2018-10-31 2014-05-01 2016-10-31     0
 9:  6 2018-10-31 2016-11-01 2021-01-01     1
10:  7 2016-02-01 2016-01-01 2021-01-01     1
11:  8 2015-05-01 2013-04-15 2014-12-31     0
12:  8 2015-05-01 2015-01-01 2015-05-01     1
13:  9 2013-09-01 2010-02-15 2013-01-01     0
14: 10 2016-01-01 2012-04-01 2021-01-01     1

通过深入研究，问题出在参数 NAbounds 中，默认情况下是 TRUE。在 OP 的数据中，有一个 NA 元素

df1 %>% 
    mutate(match = data.table::between(dateA, dateB, dateC))

Error: Problem with mutate() column match. ℹ match = data.table::between(dateA, dateB, dateC). ✖ Not yet implemented NAbounds=TRUE for this non-numeric and non-character type Run rlang::last_error() to see where the error occurred.

我们可能需要将其设置为 FALSE

df1 %>% 
   mutate(match = +(data.table::between(dateA, dateB, dateC, NAbounds = FALSE)))
    ID      dateA      dateB      dateC match
 1:  1 2010-12-31 2010-01-01 2021-01-01     1
 2:  2 2021-01-01 2012-01-01 2012-12-31     0
 3:  2 2021-01-01 2013-01-01 2017-11-30     0
 4:  2 2021-01-01 2017-12-01 2021-01-01     1
 5:  3 2010-09-30 2010-05-01 2021-01-01     1
 6:  4 2015-12-31       <NA>       <NA>    NA
 7:  5 2010-09-30 2010-04-01 2021-01-01     1
 8:  6 2018-10-31 2014-05-01 2016-10-31     0
 9:  6 2018-10-31 2016-11-01 2021-01-01     1
10:  7 2016-02-01 2016-01-01 2021-01-01     1
11:  8 2015-05-01 2013-04-15 2014-12-31     0
12:  8 2015-05-01 2015-01-01 2015-05-01     1
13:  9 2013-09-01 2010-02-15 2013-01-01     0
14: 10 2016-01-01 2012-04-01 2021-01-01     1

或者也可以用 as.Date

对 NA 进行转换

df1 %>% 
    mutate(match = +(data.table::between(dateA, dateB, dateC, 
         NAbounds = as.Date(NA))))
    ID      dateA      dateB      dateC match
 1:  1 2010-12-31 2010-01-01 2021-01-01     1
 2:  2 2021-01-01 2012-01-01 2012-12-31     0
 3:  2 2021-01-01 2013-01-01 2017-11-30     0
 4:  2 2021-01-01 2017-12-01 2021-01-01     1
 5:  3 2010-09-30 2010-05-01 2021-01-01     1
 6:  4 2015-12-31       <NA>       <NA>    NA
 7:  5 2010-09-30 2010-04-01 2021-01-01     1
 8:  6 2018-10-31 2014-05-01 2016-10-31     0
 9:  6 2018-10-31 2016-11-01 2021-01-01     1
10:  7 2016-02-01 2016-01-01 2021-01-01     1
11:  8 2015-05-01 2013-04-15 2014-12-31     0
12:  8 2015-05-01 2015-01-01 2015-05-01     1
13:  9 2013-09-01 2010-02-15 2013-01-01     0
14: 10 2016-01-01 2012-04-01 2021-01-01     1

Answer 2

library(tidyverse)
library(lubridate)


df1 %>% 
  mutate(res = +(dateA %within% interval(dateB, dateC)))
#>    ID      dateA      dateB      dateC res
#> 1   1 2010-12-31 2010-01-01 2021-01-01   1
#> 2   2 2021-01-01 2012-01-01 2012-12-31   0
#> 3   2 2021-01-01 2013-01-01 2017-11-30   0
#> 4   2 2021-01-01 2017-12-01 2021-01-01   1
#> 5   3 2010-09-30 2010-05-01 2021-01-01   1
#> 6   4 2015-12-31       <NA>       <NA>  NA
#> 7   5 2010-09-30 2010-04-01 2021-01-01   1
#> 8   6 2018-10-31 2014-05-01 2016-10-31   0
#> 9   6 2018-10-31 2016-11-01 2021-01-01   1
#> 10  7 2016-02-01 2016-01-01 2021-01-01   1
#> 11  8 2015-05-01 2013-04-15 2014-12-31   0
#> 12  8 2015-05-01 2015-01-01 2015-05-01   1
#> 13  9 2013-09-01 2010-02-15 2013-01-01   0
#> 14 10 2016-01-01 2012-04-01 2021-01-01   1

数据

df1 <- structure(
  list(
    ID = c(1, 2, 2, 2, 3, 4, 5, 6, 6, 7, 8, 8, 9,
           10),
    dateA = structure(
      c(
        14974,
        18628,
        18628,
        18628,
        14882,
        16800,
        14882,
        17835,
        17835,
        16832,
        16556,
        16556,
        15949,
        16801
      ),
      class = "Date"
    ),
    dateB = structure(
      c(
        14610,
        15340,
        15706,
        17501,
        14730,
        NA,
        14700,
        16191,
        17106,
        16801,
        15810,
        16436,
        14655,
        15431
      ),
      class = "Date"
    ),
    dateC = structure(
      c(
        18628,
        15705,
        17500,
        18628,
        18628,
        NA,
        18628,
        17105,
        18628,
        18628,
        16435,
        16556,
        15706,
        18628
      ),
      class = "Date"
    )
  ),
  row.names = c(NA,-14L),
  class = c("data.table", "data.frame")
)

日期：尚未为此非数字和非字符类型实现 NABounds=TRUE

dates: Not yet implemented NAbounds=TRUE for this non-numeric and non-character type

r

na

dplyr