rpy2 不会转换回 pandas

rpy2 does not convert back to pandas

我有一个不会转换为 Pandas 的 R 对象,奇怪的是它不会抛出错误。

更新了我正在使用的代码,很抱歉没有预先提供它——并且错过了 2 周的请求!

Python 调用 R 脚本的代码

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
import datetime
from rpy2.robjects.conversion import localconverter


def serial_date_to_string(srl_no):
    new_date = datetime.datetime(1970,1,1,0,0) + datetime.timedelta(srl_no - 1)
    return new_date.strftime("%Y-%m-%d")

jurisdiction='TX'
r=ro.r
r_df=r['source']('farrington.R')

with localconverter(ro.default_converter + pandas2ri.converter):
    pd_from_r_df = ro.conversion.rpy2py(r_df)

问题是 pd_from_r_df returns 一个 R 对象而不是 Pandas 数据框:

>>> pd_from_r_df
R object with classes: ('list',) mapped to:
[ListSexpVector, BoolSexpVector]
  value: <class 'rpy2.rinterface.ListSexpVector'>
  <rpy2.rinterface.ListSexpVector object at 0x7faa4c4eff08> [RTYPES.VECSXP]
  visible: <class 'rpy2.rinterface.BoolSexpVector'>
  <rpy2.rinterface.BoolSexpVector object at 0x7faa4c4e7948> [RTYPES.LGLSXP]

这是 R 脚本“farrington.R”,returns 一个监视时间序列,ro.conversion.rpy2py 没有(如上所用)转换为 pandas数据框

library('surveillance')
library(readr)
library(tidyr)
library(dplyr)
w<-1
b<-3
nfreq<-52
steps_back<- 28
alpha<-0.05

counts <- read_csv("Weekly_counts_of_death_by_jurisdiction_and_cause_of_death.csv")
counts<-counts[,!colnames(counts) %in% c('Cause Subgroup','Time Period','Suppress','Note','Average Number of Deaths in Time Period','Difference from 2015-2019 to 2020','Percent Difference from 2015-2019 to 2020')]
wide_counts_by_cause<-pivot_wider(counts,names_from='Cause Group',values_from='Number of Deaths',values_fn=(`Cause Group`=sum))
wide_state <- filter(wide_counts_by_cause,`State Abbreviation`==jurisdiction)
wide_state <- filter(wide_state,Type=='Unweighted')
wide_state[is.na(wide_state)] <-0
important_columns=c('Alzheimer disease and dementia','Cerebrovascular diseases','Heart failure','Hypertensive dieases','Ischemic heart disease','Other diseases of the circulatory system','Malignant neoplasms','Diabetes','Renal failure','Sepsis','Chronic lower respiratory disease','Influenza and pneumonia','Other diseases of the respiratory system','Residual (all other natural causes)')

all_columns <- append(c('Year','Week'),important_columns)

selected_wide_state<-wide_state[, names(wide_state) %in% all_columns]
start<-c(as.numeric(min(selected_wide_state[,'Year'])),as.numeric(min(selected_wide_state[,'Week'])))
freq<-as.numeric(max(selected_wide_state[,'Week']))

sts <- new("sts",epoch=1:nrow(numeric_wide_state),start=start,freq=freq,observed=numeric_wide_state)
sts_4 <- aggregate(sts[,important_columns],nfreq=nfreq)
start_idx=end_idx-steps_back

cntrlFar <- list(range=start_idx:end_idx,w==w,b==b,alpha==alpha)
surveil_ts_4_far <- farrington(sts_4,control=cntrlFar)
far_df<-tidy.sts(surveil_ts_4_far)
far_df

(此处使用 NCHS 数据 [几个月前] https://data.cdc.gov/NCHS/Weekly-counts-of-death-by-jurisdiction-and-cause-o/u6jv-9ijr/

在 R 中,默认情况下在没有命名函数的脚本上调用 source() 时,返回的对象是两个命名组件 $value$visible 的列表,其中:

  • $value 是最后显示或定义的对象,在您的情况下是 far_df 数据框(在 R data.frame 中是 class 对象扩展 list 类型);
  • $visible 是一个布尔向量,指示是否显示最后一个对象,在您的情况下是 TRUE。如果您在 far_df <- tidy.sts(surveil_ts_4_far).
  • 结束脚本,这将是 FALSE

事实上,您的 Python 错误证实了此输出表明 [ListSexpVector, BoolSexpVector].

的列表

因此,由于您只需要第一项,因此请按编号或名称相应地为第一项编制索引。

r_raw = ro.r['source']('farrington.R')        # IN R: r_raw <- source('farrington.R')
r_df  = r_raw[0]                              # IN R: r_df  <- r_raw[1]
r_df  = r_raw[r_raw.names.index('value')]     # IN R: r_df  <- r_raw$value

with localconverter(ro.default_converter + pandas2ri.converter):
    pd_from_r_df = ro.conversion.rpy2py(r_df)