如何格式化使用 rpy2 的 Python 脚本以构建具有 R-caret 函数的模型?

How to format a Python script that uses rpy2 in order to build a model with an R-caret function?

我设置了以下 R 脚本,旨在使用 caret 包从数据框构建模型:

library(caret)
library(broom)

data<- data.table("mydata.csv")

splitprob <- 0.8

traintestindex <- createDataPartition(data$fluorescence, p=splitprob, list=F)
testset <- data[-traintestindex,]
trainingset <- data[traintestindex,]

model <- train(fluorescence~., trainingset, method = "glmStepAIC", preProc = c("center","scale"), trControl = cvCtrl)

final_model<- tidy(model$finalModel)

write.csv(tidy, "model_glm.csv")

我希望能够在 Python 脚本中表达此代码的功能。生成 pandas 数据框后,它将被转换为 R 数据框,随后 运行 通过设置为与上述 R 脚本中相同参数的插入符号的训练函数。

import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri 

pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')

my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)

preprocessing= ["center", "scale"]

center_scale= StrVector(preprocessing)

cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)

model_R= caret.train("fluorescence~.", data= r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)

print(model_R.finalModel)

但是,这个脚本显然没有正确配置,因为我尝试 运行 使用 rpy2 的 Python 脚本在行 model_R= caret.train("fluorescence~., r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl") 处产生 SyntaxError: invalid syntax。我试图遵循文档中给出的语法(来源:https://rpy2.github.io/doc/latest/html/introduction.html?highlight=linear%20model),但是设置这样的代码的方式很少。

我的 Python 代码中必须修复什么才能使代码正常工作,以便我可以从我的数据框构建模型?

我找到了通过 rpy2 实现插入符号函数的格式:

import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri 

pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')

my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)

preprocessing= ["center", "scale"]
center_scale= StrVector(preprocessing)

#these are the columns in my data frame that will consist of my predictors in the model
predictors= ['predictor1','predictor2','predictor3']
predictors_vector= StrVector(predictors)

#this column from the dataframe consists of the outcome of the model
outcome= ['fluorescence']
outcome_vector= StrVector(outcome)

#this line extracts the columns of the predictors from the dataframe
columns_predictors= r_dataframe.rx(True, columns_vector)

#this line extracts the column of the outcome from the dataframe
column_response= r_dataframe.rx(True, column_response)

cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)

model_R= caret.train(columns_predictors, columns_response, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)

print(model_R.rx('finalModel'))