使用 rpy2 将 python 数据帧字符列转换为 r

Conversion python dataframe character column to r with rpy2

我正在尝试使用 rpy2 将 python 数据帧转换为 r,但我无法将 python 数据帧中的日期转换为 r 数据帧中的日期类型。

pd.to_datetime() 转换为 r 数据帧时,我没有得到正确的转换。

有问题的 df 日期列:

     event_time
0    2019-10-11
1    2020-01-01
2    2019-11-15
3    2020-03-05

转化码:

with localconverter(ro.default_converter + pandas2ri.converter):

    df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
    df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
    r_df = ro.conversion.py2rpy(df)

产生:

event_time: <class 'numpy.ndarray'>
  array([737343., 737425., 737378., 737489.])

discharge_time也是如此。

带字符串的转换代码,然后尝试转换:

with localconverter(ro.default_converter + pandas2ri.converter):

    df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
    #### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
    r_df = ro.conversion.py2rpy(df)

    r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))

生成一个数据帧:

event_time: <class 'numpy.ndarray'>
  array(['2019-10-11', '2020-01-01', '2019-11-15', '2020-03-05'], dtype='<U10')

但是这行代码 r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d')) 错误:

AttributeError: 'numpy.ndarray' object has no attribute 'index'

使用此代码生成:

with localconverter(ro.default_converter + pandas2ri.converter):

    df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
    #### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
    r_df = ro.conversion.py2rpy(df)

    r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.rx2('event_time')], '%Y-%m-%d'))

错误:

Conversion 'py2rpy' not defined for objects of type '<class 'numpy.ndarray'>'

那么如何使用 rpy2 将 python 数据帧中的日期转换为 r 中的日期?我需要它的日期格式,因为我稍后会进行日期计算,而字符串将不起作用。

版本:

pandas==1.0.1

rpy2~=3.3.5

您的问题与 rpy2 无关,您只是在 pandas 中错误地解析了日期。参见:

from pandas import DataFrame, to_datetime

df = DataFrame(dict(event_time=['2019-10-11', '2020-01-01']))

df.event_time = to_datetime(df.event_time)

print(list(df.event_time))
# [Timestamp('2019-10-11 00:00:00'), Timestamp('2020-01-01 00:00:00')]

# you using dt.strftime you was just converting them back to strings, see:
print(list(df.event_time.dt.strftime("%Y-%m-%d")))
# ['2019-10-11', '2020-01-01', '2019-11-15']

# now you could extract date object (but don't! timestamps are fine for rpy2)
print(list(df.event_time.dt.date))
# [datetime.date(2019, 10, 11), datetime.date(2020, 1, 1)]

现在在 rpy2 中你只需要做:

from rpy2.robjects import conversion, default_converter, pandas2ri
from rpy2.robjects.conversion import localconverter


with localconverter(default_converter + pandas2ri.converter):
    df_r = conversion.py2rpy(df)

print(repr(df_r.rx2('event_time')))
# R object with classes: ('POSIXct', 'POSIXt') mapped to:
# [2019-10-11, 2020-01-01]

现在您可以在 R 端享受处理日期的乐趣,请参阅 dates. Also, if you happen to use Jupyter notebooks, conversion is much more handy using cell magics