Rpy2 base.as_Date 字符数据框列到日期列的转换

Rpy2 base.as_Date conversion of character dataframe column to date column

我有一个将日期映射为字符列的 rpy2 数据框,因为我不想要 POSIXt/ct 列。我以为我可以将该字符列转换为日期,它会在 r_df 内,但我收到了一个 float

设置:

from rpy2.robjects.packages import importr
base = importr("base")

简短示例:

> base.as_Date('2020-01-01')
R object with classes: ('Date',) mapped to:
[18262.000000]

> base.as_Date('2020-01-01', format='%Y-%m-%d')
R object with classes: ('Date',) mapped to:
[18262.000000]

我的实际数据框:

> r_df
R object with classes: ('data.frame',) mapped to:
[IntSexpVe..., IntSexpVe..., IntSexpVe..., FloatSexp..., ..., StrSexpVe..., StrSexpVe..., StrSexpVe..., StrSexpVe...]
....

> r_df[i]
R object with classes: ('character',) mapped to:
['2016-11-..., '2020-02-..., '2020-07-..., '2019-01-..., ..., '2020-01-..., '2017-01-..., '2020-01-..., '2020-01-...]

> base.as_Date(r_df[i], format = "%Y-%m-%d")
R object with classes: ('Date',) mapped to:
[17106.000000, 18293.000000, 18444.000000, 17897.000000, ..., 18262.000000, 17167.000000, 18262.000000, 18262.000000]

使用相同数据帧的另一次尝试:

> r_df.rx2(col_name)
R object with classes: ('character',) mapped to:
['2016-11-..., '2020-02-..., '2020-07-..., '2019-01-..., ..., '2020-01-..., '2017-01-..., '2020-01-..., '2020-01-...]

> base.as_Date(r_df.rx2(col_name), '%Y-%m-%d')
R object with classes: ('Date',) mapped to:
[17106.000000, 18293.000000, 18444.000000, 17897.000000, ..., 18262.000000, 17167.000000, 18262.000000, 18262.000000]

上次尝试尝试从 POSIXt/ct 转换为 Date,认为它可能能够更准确地解析:

> r_df.rx2(col_name)
R object with classes: ('POSIXct', 'POSIXt') mapped to:
[2016-11-01, 2020-02-01, ..., 2020-01-01, 2020-01-01, 2017-01-01, 2020-01-01]

> base.as_Date(r_df.rx2(col_name), '%Y-%m-%d')
R object with classes: ('Date',) mapped to:
[17106.000000, 18293.000000, 18444.000000, 17897.000000, ..., 18262.000000, 17167.000000, 18262.000000, 18262.000000]

在 r studio 中执行以及我的期望是:

> as.Date('2020-01-01')
[1] "2020-01-01"

这对我来说似乎不正确。我使用 rpy2 转换器进行 python pandas df 到 r 数据帧的转换。我没有在默认转换器之外执行代码。知道如何解决这个问题并正确转换字符串

版本:

pandas==1.0.1

rpy2~=3.3.5

R == 4.0.0

在 R 中,Date 对象是带有标签的浮点数(数组)以告诉 R 它们是日期。

>>> dt = base.as_Date('2020-01-01')
>>> dt                                              
R object with classes: ('Date',) mapped to:
[18262.000000]

但是,当使用R自带的打印时:

>>> print(dt)                                       
[1] "2020-01-01"

而在 R 的 C-API 级别,这是一个浮点数

>>> dt.typeof                                                               
<RTYPES.REALSXP: 14>

有一个 R class 属性告诉 R 这是一个日期。

>>> tuple(dt.rclass)                                                        
('Date',)