如何对 sparklyr 使用 lubridate::round_date?
How to use lubridate::round_date for sparklyr?
我正在寻找将日期时间截断为分钟、小时等的方法。lubridate::round_date 之类的东西非常有用。但我不能将它与 sparklyr 一起使用?
Undefined function: 'round_date'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 46
你没有 ‒ 看到 。
正确的方法是使用内置的 Spark SQL 函数。
# For the sake of reproducibility
#
spark_session(sc) %>%
invoke("conf") %>%
invoke("set", "spark.sql.session.timeZone", "UTC")
options(tibble.width = 120)
df <- copy_to(sc, data.frame(
ts = c("2019-01-08 23:21:15", "2020-02-06 13:14:00")
)) %>% mutate(ts = to_timestamp(ts))
df
# Source: spark<?> [?? x 1]
ts
<dttm>
1 2019-01-08 23:21:15
2 2020-02-06 13:14:00
df %>%
transmute(
year = date_trunc("year", ts),
month = date_trunc("month", ts),
day = date_trunc("day", ts),
hour = date_trunc("hour", ts),
minute = date_trunc("minute", ts)
)
# Source: spark<?> [?? x 5]
year month day
<dttm> <dttm> <dttm>
1 2019-01-01 00:00:00 2019-01-01 00:00:00 2019-01-08 00:00:00
2 2020-01-01 00:00:00 2020-02-01 00:00:00 2020-02-06 00:00:00
hour minute
<dttm> <dttm>
1 2019-01-08 23:00:00 2019-01-08 23:21:00
2 2020-02-06 13:00:00 2020-02-06 13:14:00
不需要额外的导入或第三方库。
我正在寻找将日期时间截断为分钟、小时等的方法。lubridate::round_date 之类的东西非常有用。但我不能将它与 sparklyr 一起使用?
Undefined function: 'round_date'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 46
你没有 ‒ 看到
正确的方法是使用内置的 Spark SQL 函数。
# For the sake of reproducibility
#
spark_session(sc) %>%
invoke("conf") %>%
invoke("set", "spark.sql.session.timeZone", "UTC")
options(tibble.width = 120)
df <- copy_to(sc, data.frame(
ts = c("2019-01-08 23:21:15", "2020-02-06 13:14:00")
)) %>% mutate(ts = to_timestamp(ts))
df
# Source: spark<?> [?? x 1]
ts
<dttm>
1 2019-01-08 23:21:15
2 2020-02-06 13:14:00
df %>%
transmute(
year = date_trunc("year", ts),
month = date_trunc("month", ts),
day = date_trunc("day", ts),
hour = date_trunc("hour", ts),
minute = date_trunc("minute", ts)
)
# Source: spark<?> [?? x 5]
year month day
<dttm> <dttm> <dttm>
1 2019-01-01 00:00:00 2019-01-01 00:00:00 2019-01-08 00:00:00
2 2020-01-01 00:00:00 2020-02-01 00:00:00 2020-02-06 00:00:00
hour minute
<dttm> <dttm>
1 2019-01-08 23:00:00 2019-01-08 23:21:00
2 2020-02-06 13:00:00 2020-02-06 13:14:00
不需要额外的导入或第三方库。