在 R 中将 <dattm.dt> 转换为日期的有效方法

Question

这是我的问题：我正在使用 reticulate 从数据库中获取 python 数据帧。其中一个变量采用日期格式。当我进行从 python 到 R 的转换时，日期变量被转换为列表对象并且所有条目都显示为（请参见下面的 dput）：我一直在处理问题如下：

library(tidyverse);library(reticulate); library(lubridate)

date_strings <- x %>% pull(date_object) ##Retrieve the date listt
fixed_dates <- sapply(1:length(date_strings), function(j){
        p <- py_to_r(date_strings[[j]])
        return(p)} %>% as_date() ##Apply function to fix each entry individually

##Dput below
structure(list(date_object = list(<environment>, <environment>, 
    <environment>, <environment>, <environment>, <environment>, 
    <environment>, <environment>, <environment>, <environment>, 
    <environment>, <environment>, <environment>, <environment>, 
    <environment>, <environment>, <environment>, <environment>, 
    <environment>, <environment>), metric = c(0.216754862863576, 
-0.542492572263425, 0.891144645072327, 0.595980577187475, 1.63561800111297, 
0.689275441919723, -1.28124663010116, -0.213144519278363, 1.89653987190927, 
1.77686321368272, 0.566604498180317, 0.01571945400457, 0.383057338517151, 
-0.0451371159133086, 0.0343519073969926, 0.169026774218306, 1.16502683902767, 
-0.0442039972520874, -0.100368442585905, -0.283444568873591)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"), pandas.index = <environment>)

这里是 date_strings 对象的顶部元素：

[[1]]
<environment: 0x7f904dc4d5b8>
attr(,"class")
[1] "datetime.date"         "python.builtin.object"

[[2]]
<environment: 0x7f904dc4d430>
attr(,"class")
[1] "datetime.date"         "python.builtin.object"

[[3]]
<environment: 0x7f904dc4d318>
attr(,"class")
[1] "datetime.date"         "python.builtin.object"

虽然这种方法适用于小型数据集，但当数据框很大（想想数千行）时，它会花费很长时间。有没有办法优化流程或对其进行矢量化？

Answer 1

我们可以使用 lapply 而不是 sapply 并使用 do.call 转换为 vector 和 c。原因是如果评估的日期是 Date class, c` 不会将其强制为整数模式

do.call(c, lapply(seq_along(date_strings), 
        function(j) py_to_r(date_strings[[j]])))

在 R 中将 <dattm.dt> 转换为日期的有效方法

Efficient way to convert <dattm.dt> to date in R

python

r

reticulate