在 R 中将 <dattm.dt> 转换为日期的有效方法
Efficient way to convert <dattm.dt> to date in R
这是我的问题:
我正在使用 reticulate 从数据库中获取 python 数据帧。其中一个变量采用日期格式。当我进行从 python 到 R 的转换时,日期变量被转换为列表对象并且所有条目都显示为 (请参见下面的 dput):
我一直在处理问题如下:
library(tidyverse);library(reticulate); library(lubridate)
date_strings <- x %>% pull(date_object) ##Retrieve the date listt
fixed_dates <- sapply(1:length(date_strings), function(j){
p <- py_to_r(date_strings[[j]])
return(p)} %>% as_date() ##Apply function to fix each entry individually
##Dput below
structure(list(date_object = list(<environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>), metric = c(0.216754862863576,
-0.542492572263425, 0.891144645072327, 0.595980577187475, 1.63561800111297,
0.689275441919723, -1.28124663010116, -0.213144519278363, 1.89653987190927,
1.77686321368272, 0.566604498180317, 0.01571945400457, 0.383057338517151,
-0.0451371159133086, 0.0343519073969926, 0.169026774218306, 1.16502683902767,
-0.0442039972520874, -0.100368442585905, -0.283444568873591)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"), pandas.index = <environment>)
这里是 date_strings 对象的顶部元素:
[[1]]
<environment: 0x7f904dc4d5b8>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
[[2]]
<environment: 0x7f904dc4d430>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
[[3]]
<environment: 0x7f904dc4d318>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
虽然这种方法适用于小型数据集,但当数据框很大(想想数千行)时,它会花费很长时间。有没有办法优化流程或对其进行矢量化?
我们可以使用 lapply
而不是 sapply
并使用 do.call
转换为 vector
和 c
。原因是如果评估的日期是 Date
class,
c` 不会将其强制为整数模式
do.call(c, lapply(seq_along(date_strings),
function(j) py_to_r(date_strings[[j]])))
这是我的问题:
我正在使用 reticulate 从数据库中获取 python 数据帧。其中一个变量采用日期格式。当我进行从 python 到 R 的转换时,日期变量被转换为列表对象并且所有条目都显示为
library(tidyverse);library(reticulate); library(lubridate)
date_strings <- x %>% pull(date_object) ##Retrieve the date listt
fixed_dates <- sapply(1:length(date_strings), function(j){
p <- py_to_r(date_strings[[j]])
return(p)} %>% as_date() ##Apply function to fix each entry individually
##Dput below
structure(list(date_object = list(<environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>), metric = c(0.216754862863576,
-0.542492572263425, 0.891144645072327, 0.595980577187475, 1.63561800111297,
0.689275441919723, -1.28124663010116, -0.213144519278363, 1.89653987190927,
1.77686321368272, 0.566604498180317, 0.01571945400457, 0.383057338517151,
-0.0451371159133086, 0.0343519073969926, 0.169026774218306, 1.16502683902767,
-0.0442039972520874, -0.100368442585905, -0.283444568873591)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"), pandas.index = <environment>)
这里是 date_strings 对象的顶部元素:
[[1]]
<environment: 0x7f904dc4d5b8>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
[[2]]
<environment: 0x7f904dc4d430>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
[[3]]
<environment: 0x7f904dc4d318>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
虽然这种方法适用于小型数据集,但当数据框很大(想想数千行)时,它会花费很长时间。有没有办法优化流程或对其进行矢量化?
我们可以使用 lapply
而不是 sapply
并使用 do.call
转换为 vector
和 c
。原因是如果评估的日期是 Date
class,
c` 不会将其强制为整数模式
do.call(c, lapply(seq_along(date_strings),
function(j) py_to_r(date_strings[[j]])))