根据 R 中的值和 NA 过滤 "POSIXct" "POSIXt" 列
Filtering "POSIXct" "POSIXt" column based on value and NA in R
我有一个大致像这样的数据框:
tail(df)
# A tibble: 6 x 3
GEOGCD OPER_DATE TERM_DATE
<chr> <dttm> <dttm>
1 E05006867 2009-01-01 00:00:00 2019-03-31 00:00:00
2 E05006868 2009-01-01 00:00:00 2019-03-31 00:00:00
3 E05000066 2009-01-01 00:00:00 2018-05-02 00:00:00
4 E05000067 2009-01-01 00:00:00 2018-05-02 00:00:00
5 E05000068 2009-01-01 00:00:00 2018-05-02 00:00:00
6 E05000064 2018-05-01 22:00:00 NA
str(df)
tibble [52 × 3] (S3: tbl_df/tbl/data.frame)
$ GEOGCD : chr [1:52] "E05000064" "E05000065" "E05000066" "E05000067" ...
$ OPER_DATE: POSIXct[1:52], format: "2009-01-01 00:00:00" "2009-01-01 00:00:00" "2009-01-01 00:00:00" ...
$ TERM_DATE: POSIXct[1:52], format: "2018-05-02" "2018-05-02" "2018-05-02" ...
我想做的是 select 只有 TERM_DATE 高于 2018-12-31 或者是 NA 的人。基本上是这样的:
3 E05000066 2009-01-01 00:00:00 2018-05-02 00:00:00
4 E05000067 2009-01-01 00:00:00 2018-05-02 00:00:00
5 E05000068 2009-01-01 00:00:00 2018-05-02 00:00:00
6 E05000064 2018-05-01 22:00:00 NA
我尝试过不同的方法,例如:
library(lubridate)
library(dplyr)
df%>%
filter(TERM_DATE> as.Date("2018-12-31"| is.na(TERM_DATE)))
但我不断收到如下错误:
Error: Problem with filter()
input ..1
.
x operations are possible only for numeric, logical or complex types
ℹ Input ..1
is TERM_DATE > as.Date("2018-12-31" | is.na(TERM_DATE))
.
你们中有人能理解为什么会这样吗?我应该怎么做?
谢谢!
试试这个方法:
library(dplyr)
#Code
newdf <- df%>%
filter(TERM_DATE> as.POSIXct("2018-12-31") | is.na(TERM_DATE))
输出:
GEOGCD OPER_DATE TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>
来自 @StupidWolf 的智能解决方案也有效:
#Code 2
df%>%
filter(TERM_DATE> as.Date("2018-12-31") | is.na(TERM_DATE))
输出:
GEOGCD OPER_DATE TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>
OP 的预期输出可以使用:
#Code 3
newdf <- df%>%
filter(TERM_DATE< as.POSIXct("2018-12-31") | is.na(TERM_DATE))
输出:
GEOGCD OPER_DATE TERM_DATE
1 E05000066 2009-01-01 00:00:00 2018-05-02
2 E05000067 2009-01-01 00:00:00 2018-05-02
3 E05000068 2009-01-01 00:00:00 2018-05-02
4 E05000064 2018-05-01 22:00:00 <NA>
或使用as.Date()
。您需要将比较更改为 <
.
我有一个大致像这样的数据框:
tail(df)
# A tibble: 6 x 3
GEOGCD OPER_DATE TERM_DATE
<chr> <dttm> <dttm>
1 E05006867 2009-01-01 00:00:00 2019-03-31 00:00:00
2 E05006868 2009-01-01 00:00:00 2019-03-31 00:00:00
3 E05000066 2009-01-01 00:00:00 2018-05-02 00:00:00
4 E05000067 2009-01-01 00:00:00 2018-05-02 00:00:00
5 E05000068 2009-01-01 00:00:00 2018-05-02 00:00:00
6 E05000064 2018-05-01 22:00:00 NA
str(df)
tibble [52 × 3] (S3: tbl_df/tbl/data.frame)
$ GEOGCD : chr [1:52] "E05000064" "E05000065" "E05000066" "E05000067" ...
$ OPER_DATE: POSIXct[1:52], format: "2009-01-01 00:00:00" "2009-01-01 00:00:00" "2009-01-01 00:00:00" ...
$ TERM_DATE: POSIXct[1:52], format: "2018-05-02" "2018-05-02" "2018-05-02" ...
我想做的是 select 只有 TERM_DATE 高于 2018-12-31 或者是 NA 的人。基本上是这样的:
3 E05000066 2009-01-01 00:00:00 2018-05-02 00:00:00
4 E05000067 2009-01-01 00:00:00 2018-05-02 00:00:00
5 E05000068 2009-01-01 00:00:00 2018-05-02 00:00:00
6 E05000064 2018-05-01 22:00:00 NA
我尝试过不同的方法,例如:
library(lubridate)
library(dplyr)
df%>%
filter(TERM_DATE> as.Date("2018-12-31"| is.na(TERM_DATE)))
但我不断收到如下错误:
Error: Problem with
filter()
input..1
.
x operations are possible only for numeric, logical or complex types
ℹ Input..1
isTERM_DATE > as.Date("2018-12-31" | is.na(TERM_DATE))
.
你们中有人能理解为什么会这样吗?我应该怎么做?
谢谢!
试试这个方法:
library(dplyr)
#Code
newdf <- df%>%
filter(TERM_DATE> as.POSIXct("2018-12-31") | is.na(TERM_DATE))
输出:
GEOGCD OPER_DATE TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>
来自 @StupidWolf 的智能解决方案也有效:
#Code 2
df%>%
filter(TERM_DATE> as.Date("2018-12-31") | is.na(TERM_DATE))
输出:
GEOGCD OPER_DATE TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>
OP 的预期输出可以使用:
#Code 3
newdf <- df%>%
filter(TERM_DATE< as.POSIXct("2018-12-31") | is.na(TERM_DATE))
输出:
GEOGCD OPER_DATE TERM_DATE
1 E05000066 2009-01-01 00:00:00 2018-05-02
2 E05000067 2009-01-01 00:00:00 2018-05-02
3 E05000068 2009-01-01 00:00:00 2018-05-02
4 E05000064 2018-05-01 22:00:00 <NA>
或使用as.Date()
。您需要将比较更改为 <
.