参数未通过 rpy2 传递给 R
Parameter Not Being Passed to R via rpy2
我在使用 rpy2 和 R 库时遇到了一些问题 "e1071"。我正在尝试从 SVM 预测中检索概率数据,但它从未包含在 returned 对象中。
构建调用 "svm" 和 "probability=TRUE" 的模型将告诉模型在请求预测时包含额外数据。预测数据通过带有 "probability=TRUE" 参数的 "predict" 命令 return 编辑,并且应该 return 具有标签和 "probabilities" 属性的复杂数据结构。我的问题是概率属性未包含在结果中。就像概率参数从未包含在预测调用中一样。
下面是一些示例代码(必须安装 e1071 R 库):
import numpy
import rpy2
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
importr('e1071')
# configure the data set
SAMPLES = 50
trainingDataClassless = numpy.random.random((SAMPLES, 7))
trainingDataClasses = numpy.where(numpy.random.random((SAMPLES, 1)) > 0.5, 0.0, 1.0)
trainingDataFactorClasses = rpy2.robjects.FactorVector(trainingDataClasses)
# create the args for the svm
svmargs = {"x": trainingDataClassless, "y": trainingDataFactorClasses, "probability": True,
"kernel": "linear", "type": "C-classification"}
print("Starting SVM with parameters: %s" % (svmargs,))
svmObj = rpy2.robjects.r['svm'](**svmargs)
print("SVM Analysis")
predictOutcomes = rpy2.robjects.r['predict'](svmObj, trainingDataClassless, probability=True)
print("outcomes: %s" % (predictOutcomes,))
probs = rpy2.robjects.r['attr'](predictOutcomes, "probabilities")
print("probs: %s" % (probs,)) # should NOT be NULL!
可以在第 39 页的 e1071 documentation 上找到有关 R 中的预测函数(带有工作概率示例)的更多信息。
该属性似乎在某处丢失了,大概是在生成的 R 对象(一个因素)的低级和高级表示之间的转换过程中。
使用低级接口调用是一种解决方法(见下文),但如果您可以在 bitbucket 上的 rpy2 问题跟踪器上报告该问题,那就太好了。
r_predict = rpy2.robjects.rinterface.globalenv.get('predict')
r_traindata = rpy2.robjects.Matrix(trainingDataClassless)
r_true = rpy2.robjects.BoolVector([True])
predictOutcomes = r_predict(svmObj,
r_traindata,
probability=r_true)
编辑: 已打开问题...并已关闭(错误已修复 - https://bitbucket.org/rpy2/rpy2/issues/299)
您的 R 函数(svm
和 predict
)需要 运行 在事物的 R 端而不是 Python 因为 Python 没有看到或知道那些专门的功能。 Python 只能用于 numpy 样本计算,作为调用函数的管道,以及用于打印结果:
# PASS PYTHON DATASET OBJECTS INTO R
# numpy objects => R matrices
tdClassless_row,tdClassess_col = trainingDataClassless.shape
rmatrix_tdClassless = rpy2.robjects.r.matrix(tdClassless,
nrow=tdClassless_row, ncol=tdClassless_col)
rpy2.robjects.r.assign("tdClassless", rmatrix_tdClassless)
tdFactorClasses_row,tdFactorClasses_col = trainingDataFactorClasses.shape
rmatrx_tdFactorClasses = rpy2.robjects.r.matrix(tdFactorClasses,
nrow=tdFactorClasses_row, ncol=tdFactorClasses_col)
rpy2.robjects.r.assign("tdFactorClasses", rmatrix_tdFactorClassless)
# OBTAIN THE SVM FUNCTION
rsvm_funct = rpy2.robjects.globalenv['svm']
# PASS SVM PARAMETERS
svmObj_py = rsvm_funct (
rpy2.robjects('x = tdClassless'),
rpy2.robjects('y = tdFactorClasses'),
rpy2.robjects('probability = TRUE'),
rpy2.robjects('kernel = "linear"'),
rpy2.robjects('type = "C-classification"')
)
# ASSIGN svmObj in R
rpy2.robjects.r.assign("svmObj", svmObj_py)
# OBTAIN THE PREDICT FUNCTION
rpredict_funct = rpy2.robjects.globalenv['predict']
// PASS PREDICT PARAMETERS
predictOutcomes = rpredict_funct(
rpy2.robjects('svmObj'),
rpy2.robjects('tdClassless'),
rpy2.robjects('probability = TRUE')
)
我在使用 rpy2 和 R 库时遇到了一些问题 "e1071"。我正在尝试从 SVM 预测中检索概率数据,但它从未包含在 returned 对象中。
构建调用 "svm" 和 "probability=TRUE" 的模型将告诉模型在请求预测时包含额外数据。预测数据通过带有 "probability=TRUE" 参数的 "predict" 命令 return 编辑,并且应该 return 具有标签和 "probabilities" 属性的复杂数据结构。我的问题是概率属性未包含在结果中。就像概率参数从未包含在预测调用中一样。
下面是一些示例代码(必须安装 e1071 R 库):
import numpy
import rpy2
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
importr('e1071')
# configure the data set
SAMPLES = 50
trainingDataClassless = numpy.random.random((SAMPLES, 7))
trainingDataClasses = numpy.where(numpy.random.random((SAMPLES, 1)) > 0.5, 0.0, 1.0)
trainingDataFactorClasses = rpy2.robjects.FactorVector(trainingDataClasses)
# create the args for the svm
svmargs = {"x": trainingDataClassless, "y": trainingDataFactorClasses, "probability": True,
"kernel": "linear", "type": "C-classification"}
print("Starting SVM with parameters: %s" % (svmargs,))
svmObj = rpy2.robjects.r['svm'](**svmargs)
print("SVM Analysis")
predictOutcomes = rpy2.robjects.r['predict'](svmObj, trainingDataClassless, probability=True)
print("outcomes: %s" % (predictOutcomes,))
probs = rpy2.robjects.r['attr'](predictOutcomes, "probabilities")
print("probs: %s" % (probs,)) # should NOT be NULL!
可以在第 39 页的 e1071 documentation 上找到有关 R 中的预测函数(带有工作概率示例)的更多信息。
该属性似乎在某处丢失了,大概是在生成的 R 对象(一个因素)的低级和高级表示之间的转换过程中。
使用低级接口调用是一种解决方法(见下文),但如果您可以在 bitbucket 上的 rpy2 问题跟踪器上报告该问题,那就太好了。
r_predict = rpy2.robjects.rinterface.globalenv.get('predict')
r_traindata = rpy2.robjects.Matrix(trainingDataClassless)
r_true = rpy2.robjects.BoolVector([True])
predictOutcomes = r_predict(svmObj,
r_traindata,
probability=r_true)
编辑: 已打开问题...并已关闭(错误已修复 - https://bitbucket.org/rpy2/rpy2/issues/299)
您的 R 函数(svm
和 predict
)需要 运行 在事物的 R 端而不是 Python 因为 Python 没有看到或知道那些专门的功能。 Python 只能用于 numpy 样本计算,作为调用函数的管道,以及用于打印结果:
# PASS PYTHON DATASET OBJECTS INTO R
# numpy objects => R matrices
tdClassless_row,tdClassess_col = trainingDataClassless.shape
rmatrix_tdClassless = rpy2.robjects.r.matrix(tdClassless,
nrow=tdClassless_row, ncol=tdClassless_col)
rpy2.robjects.r.assign("tdClassless", rmatrix_tdClassless)
tdFactorClasses_row,tdFactorClasses_col = trainingDataFactorClasses.shape
rmatrx_tdFactorClasses = rpy2.robjects.r.matrix(tdFactorClasses,
nrow=tdFactorClasses_row, ncol=tdFactorClasses_col)
rpy2.robjects.r.assign("tdFactorClasses", rmatrix_tdFactorClassless)
# OBTAIN THE SVM FUNCTION
rsvm_funct = rpy2.robjects.globalenv['svm']
# PASS SVM PARAMETERS
svmObj_py = rsvm_funct (
rpy2.robjects('x = tdClassless'),
rpy2.robjects('y = tdFactorClasses'),
rpy2.robjects('probability = TRUE'),
rpy2.robjects('kernel = "linear"'),
rpy2.robjects('type = "C-classification"')
)
# ASSIGN svmObj in R
rpy2.robjects.r.assign("svmObj", svmObj_py)
# OBTAIN THE PREDICT FUNCTION
rpredict_funct = rpy2.robjects.globalenv['predict']
// PASS PREDICT PARAMETERS
predictOutcomes = rpredict_funct(
rpy2.robjects('svmObj'),
rpy2.robjects('tdClassless'),
rpy2.robjects('probability = TRUE')
)