将 padr 与 thicken 一起使用会导致错误 "missing value where TRUE/FALSE needed"
Using padr with thicken results in error "missing value where TRUE/FALSE needed"
我一直在尝试让 padr 与我的数据集一起工作,但没有取得太大的成功,尽管我可以让示例工作:
# I have a few datetime columns so I convert all to POSIXct with UTC.
> df <- mutate_at(DATABASE, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))))
> df <- as_tibble(df)
> head(df, 20)
# A tibble: 20 x 2
charttime sbp
<dttm> <dbl>
1 2101-10-20 22:30:01 NA
2 2101-10-20 18:45:00 62
3 2101-10-20 19:00:00 66
4 2101-10-20 19:12:00 NA
5 2101-10-20 19:14:00 NA
6 2101-10-20 19:15:00 217
7 2101-10-20 19:26:00 NA
8 2101-10-20 19:30:00 102
9 2101-10-20 19:45:00 94
10 2101-10-20 19:59:00 NA
11 2101-10-20 20:00:00 80
12 2101-10-20 20:04:00 NA
13 2101-10-20 20:15:00 91
14 2101-10-20 20:30:00 86
15 2101-10-20 20:45:00 96
16 2101-10-20 21:00:00 73
17 2101-10-20 21:15:00 84
18 2101-10-20 21:30:00 96
19 2101-10-20 21:45:00 100
20 2101-10-20 21:51:00 NA
> df$charttime %>% get_interval # should say 'sec'
[1] "sec"
> df %>% thicken(interval='hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
但以 padr 为例,它是有效的:
> coffee %>% thicken(interval='day')
time_stamp amount time_stamp_day
1 2016-07-07 03:11:21 3.14 2016-07-07
2 2016-07-07 03:46:48 2.98 2016-07-07
3 2016-07-09 07:25:17 4.11 2016-07-09
4 2016-07-10 04:45:11 3.14 2016-07-10
> coffee$time_stamp %>% get_interval # should say 'sec'
[1] "sec"
我无法弄清楚为什么我的数据集无法正常工作以及如何解释错误。
更新 1
这是我正在尝试做的另一个更完整的示例。我还包括一个 csv,其中包含我正在处理的一小段数据,因此这个问题更容易重现。我已经在两台机器上试过了,我得到了相同的结果。
您会注意到,在上面的示例和下面的示例中,charttime 的第一个值是不同的。 (2101-10-20 22:30:01 更改为 2101-10-20 22:30:00)。我想让间隔为 'sec' 而不是 'min' 所以我手动更改了值。无论哪种方式都会导致同样的问题。
> packageVersion("tidyverse")
[1] ‘1.1.1’
> packageVersion("lubridate")
[1] ‘1.6.0’
> packageVersion("padr")
[1] ‘0.3.0’
> library(tidyverse)
> library(lubridate)
> library(padr)
>
> df <- read.csv("padr_data.csv")
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))))
> df$sbp <- as.numeric(df$sbp)
> summary(df)
charttime sbp
Min. :2101-10-20 18:30:00 Min. : 62.0
1st Qu.:2101-10-20 19:33:45 1st Qu.: 84.5
Median :2101-10-20 20:52:30 Median : 95.0
Mean :2101-10-20 21:08:22 Mean :100.9
3rd Qu.:2101-10-20 22:26:15 3rd Qu.:102.0
Max. :2101-10-21 00:42:00 Max. :217.0
NA's :12
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt"
$sbp
[1] "numeric"
> df$charttime %>% get_interval
[1] "min"
>
> # this does not work
> df[!is.na(df$charttime),] %>%
+ thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
>
> # this does not work
> df %>%
+ thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
解决方案 1 - 无效
不太了解这个包,但我会尝试两件事:
- 过滤 NA 值
- 声明
by
参数
试试这个
df[!is.na(df$sbp),] %>% thicken(interval='hour', by = 'charttime')
解决方案 2 - 无效
尝试将 df
强制转换为数据框而不是 tibble,之后还尝试将 charttime
强制转换为日期:
df <- data.frame(df)
df$charttime <- as.POSIXct(df$charttime)
解决方案 3 - 无效
您的 charttime
上可能有一些 NA
,试试这个:
df[!is.na(df$charttime),] %>% thicken(interval = 'hour')
我试过重命名变量,但这不是问题所在。
抱歉,我还不能发表评论。请告诉我它是否有效。
padr 似乎不能很好地处理未来设置的日期!更具体地说,未来 20 年以上的日期将不起作用。我将向 padr 开发人员提出一个问题,看看如何改进代码。
> packageVersion("tidyverse")
[1] ‘1.1.1’
> packageVersion("lubridate")
[1] ‘1.6.0’
> packageVersion("padr")
[1] ‘0.3.0’
> library(tidyverse)
> library(lubridate)
> library(padr)
>
> df <- read.csv("padr_data.csv")
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(63)))
>
> df$sbp <- as.numeric(df$sbp)
> #df <- na.omit(df)
>
> summary(df)
charttime sbp
Min. :2038-11-04 18:30:00 Min. : 62.0
1st Qu.:2038-11-04 19:33:45 1st Qu.: 84.5
Median :2038-11-04 20:52:30 Median : 95.0
Mean :2038-11-04 21:08:22 Mean :100.9
3rd Qu.:2038-11-04 22:26:15 3rd Qu.:102.0
Max. :2038-11-05 00:42:00 Max. :217.0
NA's :12
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt"
$sbp
[1] "numeric"
> df$charttime %>% get_interval
[1] "min"
>
> # this does not work
> df[!is.na(df$charttime),] %>%
+ thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In round_down_core(a, b) : NAs introduced by coercion to integer range
2: In round_down_core(a, b) : NAs introduced by coercion to integer range
将dyears(63)
更改为dyears(64)
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(64)))
>
> df$sbp <- as.numeric(df$sbp)
> #df <- na.omit(df)
>
> summary(df)
charttime sbp
Min. :2037-11-04 18:30:00 Min. : 62.0
1st Qu.:2037-11-04 19:33:45 1st Qu.: 84.5
Median :2037-11-04 20:52:30 Median : 95.0
Mean :2037-11-04 21:08:22 Mean :100.9
3rd Qu.:2037-11-04 22:26:15 3rd Qu.:102.0
Max. :2037-11-05 00:42:00 Max. :217.0
NA's :12
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt"
$sbp
[1] "numeric"
> df$charttime %>% get_interval
[1] "min"
>
> # this does work
> df[!is.na(df$charttime),] %>%
+ thicken(interval = 'hour')
charttime sbp charttime_hour
1 2037-11-04 18:30:00 NA 2037-11-04 18:00:00
2 2037-11-04 18:45:00 62 2037-11-04 18:00:00
3 2037-11-04 19:00:00 66 2037-11-04 19:00:00
4 2037-11-04 19:12:00 NA 2037-11-04 19:00:00
5 2037-11-04 19:14:00 NA 2037-11-04 19:00:00
6 2037-11-04 19:15:00 217 2037-11-04 19:00:00
7 2037-11-04 19:26:00 NA 2037-11-04 19:00:00
8 2037-11-04 19:30:00 102 2037-11-04 19:00:00
9 2037-11-04 19:45:00 94 2037-11-04 19:00:00
10 2037-11-04 19:59:00 NA 2037-11-04 19:00:00
11 2037-11-04 20:00:00 80 2037-11-04 20:00:00
12 2037-11-04 20:04:00 NA 2037-11-04 20:00:00
13 2037-11-04 20:15:00 91 2037-11-04 20:00:00
14 2037-11-04 20:30:00 86 2037-11-04 20:00:00
15 2037-11-04 20:45:00 96 2037-11-04 20:00:00
16 2037-11-04 21:00:00 73 2037-11-04 21:00:00
17 2037-11-04 21:15:00 84 2037-11-04 21:00:00
18 2037-11-04 21:30:00 96 2037-11-04 21:00:00
19 2037-11-04 21:45:00 100 2037-11-04 21:00:00
20 2037-11-04 21:51:00 NA 2037-11-04 21:00:00
21 2037-11-04 22:00:00 NA 2037-11-04 22:00:00
22 2037-11-04 22:15:00 123 2037-11-04 22:00:00
23 2037-11-04 22:30:00 125 2037-11-04 22:00:00
24 2037-11-04 22:45:00 132 2037-11-04 22:00:00
25 2037-11-04 23:00:00 88 2037-11-04 23:00:00
26 2037-11-04 23:15:00 NA 2037-11-04 23:00:00
27 2037-11-04 23:45:00 NA 2037-11-04 23:00:00
28 2037-11-05 00:00:00 102 2037-11-05 00:00:00
29 2037-11-05 00:28:00 NA 2037-11-05 00:00:00
30 2037-11-05 00:42:00 NA 2037-11-05 00:00:00
我一直在尝试让 padr 与我的数据集一起工作,但没有取得太大的成功,尽管我可以让示例工作:
# I have a few datetime columns so I convert all to POSIXct with UTC.
> df <- mutate_at(DATABASE, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))))
> df <- as_tibble(df)
> head(df, 20)
# A tibble: 20 x 2
charttime sbp
<dttm> <dbl>
1 2101-10-20 22:30:01 NA
2 2101-10-20 18:45:00 62
3 2101-10-20 19:00:00 66
4 2101-10-20 19:12:00 NA
5 2101-10-20 19:14:00 NA
6 2101-10-20 19:15:00 217
7 2101-10-20 19:26:00 NA
8 2101-10-20 19:30:00 102
9 2101-10-20 19:45:00 94
10 2101-10-20 19:59:00 NA
11 2101-10-20 20:00:00 80
12 2101-10-20 20:04:00 NA
13 2101-10-20 20:15:00 91
14 2101-10-20 20:30:00 86
15 2101-10-20 20:45:00 96
16 2101-10-20 21:00:00 73
17 2101-10-20 21:15:00 84
18 2101-10-20 21:30:00 96
19 2101-10-20 21:45:00 100
20 2101-10-20 21:51:00 NA
> df$charttime %>% get_interval # should say 'sec'
[1] "sec"
> df %>% thicken(interval='hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
但以 padr 为例,它是有效的:
> coffee %>% thicken(interval='day')
time_stamp amount time_stamp_day
1 2016-07-07 03:11:21 3.14 2016-07-07
2 2016-07-07 03:46:48 2.98 2016-07-07
3 2016-07-09 07:25:17 4.11 2016-07-09
4 2016-07-10 04:45:11 3.14 2016-07-10
> coffee$time_stamp %>% get_interval # should say 'sec'
[1] "sec"
我无法弄清楚为什么我的数据集无法正常工作以及如何解释错误。
更新 1
这是我正在尝试做的另一个更完整的示例。我还包括一个 csv,其中包含我正在处理的一小段数据,因此这个问题更容易重现。我已经在两台机器上试过了,我得到了相同的结果。
您会注意到,在上面的示例和下面的示例中,charttime 的第一个值是不同的。 (2101-10-20 22:30:01 更改为 2101-10-20 22:30:00)。我想让间隔为 'sec' 而不是 'min' 所以我手动更改了值。无论哪种方式都会导致同样的问题。
> packageVersion("tidyverse")
[1] ‘1.1.1’
> packageVersion("lubridate")
[1] ‘1.6.0’
> packageVersion("padr")
[1] ‘0.3.0’
> library(tidyverse)
> library(lubridate)
> library(padr)
>
> df <- read.csv("padr_data.csv")
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))))
> df$sbp <- as.numeric(df$sbp)
> summary(df)
charttime sbp
Min. :2101-10-20 18:30:00 Min. : 62.0
1st Qu.:2101-10-20 19:33:45 1st Qu.: 84.5
Median :2101-10-20 20:52:30 Median : 95.0
Mean :2101-10-20 21:08:22 Mean :100.9
3rd Qu.:2101-10-20 22:26:15 3rd Qu.:102.0
Max. :2101-10-21 00:42:00 Max. :217.0
NA's :12
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt"
$sbp
[1] "numeric"
> df$charttime %>% get_interval
[1] "min"
>
> # this does not work
> df[!is.na(df$charttime),] %>%
+ thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
>
> # this does not work
> df %>%
+ thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
解决方案 1 - 无效
不太了解这个包,但我会尝试两件事:
- 过滤 NA 值
- 声明
by
参数
试试这个
df[!is.na(df$sbp),] %>% thicken(interval='hour', by = 'charttime')
解决方案 2 - 无效
尝试将 df
强制转换为数据框而不是 tibble,之后还尝试将 charttime
强制转换为日期:
df <- data.frame(df)
df$charttime <- as.POSIXct(df$charttime)
解决方案 3 - 无效
您的 charttime
上可能有一些 NA
,试试这个:
df[!is.na(df$charttime),] %>% thicken(interval = 'hour')
我试过重命名变量,但这不是问题所在。 抱歉,我还不能发表评论。请告诉我它是否有效。
padr 似乎不能很好地处理未来设置的日期!更具体地说,未来 20 年以上的日期将不起作用。我将向 padr 开发人员提出一个问题,看看如何改进代码。
> packageVersion("tidyverse")
[1] ‘1.1.1’
> packageVersion("lubridate")
[1] ‘1.6.0’
> packageVersion("padr")
[1] ‘0.3.0’
> library(tidyverse)
> library(lubridate)
> library(padr)
>
> df <- read.csv("padr_data.csv")
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(63)))
>
> df$sbp <- as.numeric(df$sbp)
> #df <- na.omit(df)
>
> summary(df)
charttime sbp
Min. :2038-11-04 18:30:00 Min. : 62.0
1st Qu.:2038-11-04 19:33:45 1st Qu.: 84.5
Median :2038-11-04 20:52:30 Median : 95.0
Mean :2038-11-04 21:08:22 Mean :100.9
3rd Qu.:2038-11-04 22:26:15 3rd Qu.:102.0
Max. :2038-11-05 00:42:00 Max. :217.0
NA's :12
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt"
$sbp
[1] "numeric"
> df$charttime %>% get_interval
[1] "min"
>
> # this does not work
> df[!is.na(df$charttime),] %>%
+ thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In round_down_core(a, b) : NAs introduced by coercion to integer range
2: In round_down_core(a, b) : NAs introduced by coercion to integer range
将dyears(63)
更改为dyears(64)
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(64)))
>
> df$sbp <- as.numeric(df$sbp)
> #df <- na.omit(df)
>
> summary(df)
charttime sbp
Min. :2037-11-04 18:30:00 Min. : 62.0
1st Qu.:2037-11-04 19:33:45 1st Qu.: 84.5
Median :2037-11-04 20:52:30 Median : 95.0
Mean :2037-11-04 21:08:22 Mean :100.9
3rd Qu.:2037-11-04 22:26:15 3rd Qu.:102.0
Max. :2037-11-05 00:42:00 Max. :217.0
NA's :12
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt"
$sbp
[1] "numeric"
> df$charttime %>% get_interval
[1] "min"
>
> # this does work
> df[!is.na(df$charttime),] %>%
+ thicken(interval = 'hour')
charttime sbp charttime_hour
1 2037-11-04 18:30:00 NA 2037-11-04 18:00:00
2 2037-11-04 18:45:00 62 2037-11-04 18:00:00
3 2037-11-04 19:00:00 66 2037-11-04 19:00:00
4 2037-11-04 19:12:00 NA 2037-11-04 19:00:00
5 2037-11-04 19:14:00 NA 2037-11-04 19:00:00
6 2037-11-04 19:15:00 217 2037-11-04 19:00:00
7 2037-11-04 19:26:00 NA 2037-11-04 19:00:00
8 2037-11-04 19:30:00 102 2037-11-04 19:00:00
9 2037-11-04 19:45:00 94 2037-11-04 19:00:00
10 2037-11-04 19:59:00 NA 2037-11-04 19:00:00
11 2037-11-04 20:00:00 80 2037-11-04 20:00:00
12 2037-11-04 20:04:00 NA 2037-11-04 20:00:00
13 2037-11-04 20:15:00 91 2037-11-04 20:00:00
14 2037-11-04 20:30:00 86 2037-11-04 20:00:00
15 2037-11-04 20:45:00 96 2037-11-04 20:00:00
16 2037-11-04 21:00:00 73 2037-11-04 21:00:00
17 2037-11-04 21:15:00 84 2037-11-04 21:00:00
18 2037-11-04 21:30:00 96 2037-11-04 21:00:00
19 2037-11-04 21:45:00 100 2037-11-04 21:00:00
20 2037-11-04 21:51:00 NA 2037-11-04 21:00:00
21 2037-11-04 22:00:00 NA 2037-11-04 22:00:00
22 2037-11-04 22:15:00 123 2037-11-04 22:00:00
23 2037-11-04 22:30:00 125 2037-11-04 22:00:00
24 2037-11-04 22:45:00 132 2037-11-04 22:00:00
25 2037-11-04 23:00:00 88 2037-11-04 23:00:00
26 2037-11-04 23:15:00 NA 2037-11-04 23:00:00
27 2037-11-04 23:45:00 NA 2037-11-04 23:00:00
28 2037-11-05 00:00:00 102 2037-11-05 00:00:00
29 2037-11-05 00:28:00 NA 2037-11-05 00:00:00
30 2037-11-05 00:42:00 NA 2037-11-05 00:00:00