如何格式化使用 rpy2 的 Python 脚本以构建具有 R-caret 函数的模型？

Question

我设置了以下 R 脚本，旨在使用 caret 包从数据框构建模型：

library(caret)
library(broom)

data<- data.table("mydata.csv")

splitprob <- 0.8

traintestindex <- createDataPartition(data$fluorescence, p=splitprob, list=F)
testset <- data[-traintestindex,]
trainingset <- data[traintestindex,]

model <- train(fluorescence~., trainingset, method = "glmStepAIC", preProc = c("center","scale"), trControl = cvCtrl)

final_model<- tidy(model$finalModel)

write.csv(tidy, "model_glm.csv")

我希望能够在 Python 脚本中表达此代码的功能。生成 pandas 数据框后，它将被转换为 R 数据框，随后运行通过设置为与上述 R 脚本中相同参数的插入符号的训练函数。

import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri 

pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')

my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)

preprocessing= ["center", "scale"]

center_scale= StrVector(preprocessing)

cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)

model_R= caret.train("fluorescence~.", data= r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)

print(model_R.finalModel)

但是，这个脚本显然没有正确配置，因为我尝试运行使用 rpy2 的 Python 脚本在行 model_R= caret.train("fluorescence~., r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl") 处产生 SyntaxError: invalid syntax。我试图遵循文档中给出的语法（来源：https://rpy2.github.io/doc/latest/html/introduction.html?highlight=linear%20model），但是设置这样的代码的方式很少。

我的 Python 代码中必须修复什么才能使代码正常工作，以便我可以从我的数据框构建模型？

Answer 1

我找到了通过 rpy2 实现插入符号函数的格式：

import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri 

pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')

my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)

preprocessing= ["center", "scale"]
center_scale= StrVector(preprocessing)

#these are the columns in my data frame that will consist of my predictors in the model
predictors= ['predictor1','predictor2','predictor3']
predictors_vector= StrVector(predictors)

#this column from the dataframe consists of the outcome of the model
outcome= ['fluorescence']
outcome_vector= StrVector(outcome)

#this line extracts the columns of the predictors from the dataframe
columns_predictors= r_dataframe.rx(True, columns_vector)

#this line extracts the column of the outcome from the dataframe
column_response= r_dataframe.rx(True, column_response)

cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)

model_R= caret.train(columns_predictors, columns_response, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)

print(model_R.rx('finalModel'))

如何格式化使用 rpy2 的 Python 脚本以构建具有 R-caret 函数的模型？

How to format a Python script that uses rpy2 in order to build a model with an R-caret function?

rpy2

python-2.7

pandas

r-caret