使用 rpy2 将 python 数据帧字符列转换为 r
Conversion python dataframe character column to r with rpy2
我正在尝试使用 rpy2 将 python 数据帧转换为 r,但我无法将 python 数据帧中的日期转换为 r 数据帧中的日期类型。
将 pd.to_datetime()
转换为 r 数据帧时,我没有得到正确的转换。
有问题的 df 日期列:
event_time
0 2019-10-11
1 2020-01-01
2 2019-11-15
3 2020-03-05
转化码:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
产生:
event_time: <class 'numpy.ndarray'>
array([737343., 737425., 737378., 737489.])
discharge_time也是如此。
带字符串的转换代码,然后尝试转换:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
#### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))
生成一个数据帧:
event_time: <class 'numpy.ndarray'>
array(['2019-10-11', '2020-01-01', '2019-11-15', '2020-03-05'], dtype='<U10')
但是这行代码 r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))
错误:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
使用此代码生成:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
#### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.rx2('event_time')], '%Y-%m-%d'))
错误:
Conversion 'py2rpy' not defined for objects of type '<class
'numpy.ndarray'>'
那么如何使用 rpy2 将 python 数据帧中的日期转换为 r 中的日期?我需要它的日期格式,因为我稍后会进行日期计算,而字符串将不起作用。
版本:
pandas==1.0.1
rpy2~=3.3.5
您的问题与 rpy2 无关,您只是在 pandas 中错误地解析了日期。参见:
from pandas import DataFrame, to_datetime
df = DataFrame(dict(event_time=['2019-10-11', '2020-01-01']))
df.event_time = to_datetime(df.event_time)
print(list(df.event_time))
# [Timestamp('2019-10-11 00:00:00'), Timestamp('2020-01-01 00:00:00')]
# you using dt.strftime you was just converting them back to strings, see:
print(list(df.event_time.dt.strftime("%Y-%m-%d")))
# ['2019-10-11', '2020-01-01', '2019-11-15']
# now you could extract date object (but don't! timestamps are fine for rpy2)
print(list(df.event_time.dt.date))
# [datetime.date(2019, 10, 11), datetime.date(2020, 1, 1)]
现在在 rpy2 中你只需要做:
from rpy2.robjects import conversion, default_converter, pandas2ri
from rpy2.robjects.conversion import localconverter
with localconverter(default_converter + pandas2ri.converter):
df_r = conversion.py2rpy(df)
print(repr(df_r.rx2('event_time')))
# R object with classes: ('POSIXct', 'POSIXt') mapped to:
# [2019-10-11, 2020-01-01]
现在您可以在 R 端享受处理日期的乐趣,请参阅 dates. Also, if you happen to use Jupyter notebooks, conversion is much more handy using cell magics。
我正在尝试使用 rpy2 将 python 数据帧转换为 r,但我无法将 python 数据帧中的日期转换为 r 数据帧中的日期类型。
将 pd.to_datetime()
转换为 r 数据帧时,我没有得到正确的转换。
有问题的 df 日期列:
event_time
0 2019-10-11
1 2020-01-01
2 2019-11-15
3 2020-03-05
转化码:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
产生:
event_time: <class 'numpy.ndarray'>
array([737343., 737425., 737378., 737489.])
discharge_time也是如此。
带字符串的转换代码,然后尝试转换:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
#### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))
生成一个数据帧:
event_time: <class 'numpy.ndarray'>
array(['2019-10-11', '2020-01-01', '2019-11-15', '2020-03-05'], dtype='<U10')
但是这行代码 r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))
错误:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
使用此代码生成:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
#### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.rx2('event_time')], '%Y-%m-%d'))
错误:
Conversion 'py2rpy' not defined for objects of type '<class 'numpy.ndarray'>'
那么如何使用 rpy2 将 python 数据帧中的日期转换为 r 中的日期?我需要它的日期格式,因为我稍后会进行日期计算,而字符串将不起作用。
版本:
pandas==1.0.1
rpy2~=3.3.5
您的问题与 rpy2 无关,您只是在 pandas 中错误地解析了日期。参见:
from pandas import DataFrame, to_datetime
df = DataFrame(dict(event_time=['2019-10-11', '2020-01-01']))
df.event_time = to_datetime(df.event_time)
print(list(df.event_time))
# [Timestamp('2019-10-11 00:00:00'), Timestamp('2020-01-01 00:00:00')]
# you using dt.strftime you was just converting them back to strings, see:
print(list(df.event_time.dt.strftime("%Y-%m-%d")))
# ['2019-10-11', '2020-01-01', '2019-11-15']
# now you could extract date object (but don't! timestamps are fine for rpy2)
print(list(df.event_time.dt.date))
# [datetime.date(2019, 10, 11), datetime.date(2020, 1, 1)]
现在在 rpy2 中你只需要做:
from rpy2.robjects import conversion, default_converter, pandas2ri
from rpy2.robjects.conversion import localconverter
with localconverter(default_converter + pandas2ri.converter):
df_r = conversion.py2rpy(df)
print(repr(df_r.rx2('event_time')))
# R object with classes: ('POSIXct', 'POSIXt') mapped to:
# [2019-10-11, 2020-01-01]
现在您可以在 R 端享受处理日期的乐趣,请参阅 dates. Also, if you happen to use Jupyter notebooks, conversion is much more handy using cell magics。