将 rpart 与 rpy2 一起使用时出现不一致的数组错误

Non conformable array error when using rpart with rpy2

我在 python 3.5 上使用 rpartrpy2(版本 2.8.6),并且想要训练分类决策树。我的代码片段如下所示:

import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri
from rpy2.robjects import DataFrame, Formula
rpart = importr('rpart')
numpy2ri.activate()
pandas2ri.activate()

dataf = DataFrame({'responsev': owner_train_label,
               'predictorv': owner_train_data})
formula = Formula('responsev ~.')
clf = rpart.rpart(formula = formula, data = dataf, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))

where owner_train_label is a numpy float64 array of shape (12610,) and owner_train_data is a numpy float64 array of shape (12610,88)

这是我在 运行 最后一行代码 来拟合数据时遇到的错误。

RRuntimeError: Error in ((xmiss %*% rep(1, ncol(xmiss))) < ncol(xmiss)) & !ymiss : 
non-conformable arrays

我知道它告诉我它们是不一致的数组,但我不知道为什么对于相同的训练数据,我可以成功地使用 sklearn 的决策树进行训练。 谢谢你的帮助。

我通过使用 pandas 创建数据帧并使用 rpy2 的 pandas2ri 将 pandas 数据帧传递给 rpart 将其转换为 R 的数据帧来解决这个问题。

from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects import Formula
rpart = importr('rpart')
pandas2ri.activate()

df = pd.DataFrame(data = owner_train_data)
df['l'] = owner_train_label
formula = Formula('l ~.')
clf = rpart.rpart(formula = formula, data = df, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))