使用 rpy2 从 python 调用 R 库 "randomForest"
Calling R library "randomForest" from python using rpy2
我想使用 rpy2 在我的 python 脚本中嵌入一些 R 库。我已经嵌入成功"stats.lm",但现在我想嵌入"randomForest"。
import pandas as pd
from rpy2.robjects.packages import importr
from rpy2.robjects import r, pandas2ri
import rpy2.robjects as robjects
randomForest=importr('randomForest')
pandas2ri.activate()
#read data
df = pd.read_csv('train.csv',index_col=0)
rdf = pandas2ri.py2ri(df)
#check
print(type(rdf))
print(rdf)
#Random Forest
formula = 'target ~ .'
fit_full = randomForest(formula, data=rdf)
输出为:
Traceback (most recent call last):
File "<ipython-input-5-776f4072f19e>", line 2, in <module>
fit_full = randomForest(formula, data=rdf)
TypeError: 'InstalledSTPackage' object is not callable
我已经在 R 中成功地使用了这个包来为这个数据集建模。 "train.csv" 是一个包含数万个样本(行)和大约 94 列的矩阵:93 个特征(class 整数),1 个目标(class 因子)。目标列有 9 个 classes (Class_1,...,Class_9).
----------------编辑----------------
部分解决方案可能是将代码直接嵌入到包含模型和预测的函数中:
import rpy2.robjects as robjects
import rpy2
from rpy2.robjects import pandas2ri
rpy2.__version__
robjects.r('''
f <- function() {
library(randomForest)
train <- read.csv("train.csv")
train1 <- train[sample(c(1:60000), 5000, replace = TRUE),2:95]
train1.rf <- randomForest(target ~ ., data = train1,
importance = TRUE,
do.trace = 100)
pred <- as.data.frame(predict(train1.rf, train1[1:100,1:93]))
}
''')
r_f = robjects.globalenv['f']
pred=pandas2ri.ri2py(r_f())
但我仍然想知道是否有更好的解决方案(也存储模型 "train1.rf")。
这就是我要搜索的内容:
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
import pandas as pd
import random
pandas2ri.activate()
df = pd.read_csv('train.csv',index_col=0)
train=df.iloc[random.sample(range(1,60000), 5000),0:94]
test=df.iloc[random.sample(range(1,60000), 100),0:93]
rtrain = pandas2ri.py2ri(train)
print(rtrain)
rtest = pandas2ri.py2ri(test)
print(rtest)
robjects.r('''
f <- function(train) {
library(randomForest)
train1.rf <- randomForest(target ~ ., data = train, importance = TRUE, do.trace = 100)
}
''')
r_f = robjects.globalenv['f']
rf_model=(r_f(rtrain))
robjects.r('''
g <- function(model,test) {
pred <- as.data.frame(predict(model, test))
}
''')
r_g = robjects.globalenv['g']
pred=pandas2ri.ri2py(r_g(rf_model,rtest))
我想使用 rpy2 在我的 python 脚本中嵌入一些 R 库。我已经嵌入成功"stats.lm",但现在我想嵌入"randomForest"。
import pandas as pd
from rpy2.robjects.packages import importr
from rpy2.robjects import r, pandas2ri
import rpy2.robjects as robjects
randomForest=importr('randomForest')
pandas2ri.activate()
#read data
df = pd.read_csv('train.csv',index_col=0)
rdf = pandas2ri.py2ri(df)
#check
print(type(rdf))
print(rdf)
#Random Forest
formula = 'target ~ .'
fit_full = randomForest(formula, data=rdf)
输出为:
Traceback (most recent call last):
File "<ipython-input-5-776f4072f19e>", line 2, in <module>
fit_full = randomForest(formula, data=rdf)
TypeError: 'InstalledSTPackage' object is not callable
我已经在 R 中成功地使用了这个包来为这个数据集建模。 "train.csv" 是一个包含数万个样本(行)和大约 94 列的矩阵:93 个特征(class 整数),1 个目标(class 因子)。目标列有 9 个 classes (Class_1,...,Class_9).
----------------编辑----------------
部分解决方案可能是将代码直接嵌入到包含模型和预测的函数中:
import rpy2.robjects as robjects
import rpy2
from rpy2.robjects import pandas2ri
rpy2.__version__
robjects.r('''
f <- function() {
library(randomForest)
train <- read.csv("train.csv")
train1 <- train[sample(c(1:60000), 5000, replace = TRUE),2:95]
train1.rf <- randomForest(target ~ ., data = train1,
importance = TRUE,
do.trace = 100)
pred <- as.data.frame(predict(train1.rf, train1[1:100,1:93]))
}
''')
r_f = robjects.globalenv['f']
pred=pandas2ri.ri2py(r_f())
但我仍然想知道是否有更好的解决方案(也存储模型 "train1.rf")。
这就是我要搜索的内容:
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
import pandas as pd
import random
pandas2ri.activate()
df = pd.read_csv('train.csv',index_col=0)
train=df.iloc[random.sample(range(1,60000), 5000),0:94]
test=df.iloc[random.sample(range(1,60000), 100),0:93]
rtrain = pandas2ri.py2ri(train)
print(rtrain)
rtest = pandas2ri.py2ri(test)
print(rtest)
robjects.r('''
f <- function(train) {
library(randomForest)
train1.rf <- randomForest(target ~ ., data = train, importance = TRUE, do.trace = 100)
}
''')
r_f = robjects.globalenv['f']
rf_model=(r_f(rtrain))
robjects.r('''
g <- function(model,test) {
pred <- as.data.frame(predict(model, test))
}
''')
r_g = robjects.globalenv['g']
pred=pandas2ri.ri2py(r_g(rf_model,rtest))