rpy2下的Cubist回归:"subscript out of bounds"错误

Cubist regression under rpy2: "subscript out of bounds" error

我用rpy2做Cubist时regression.I遇到错误:

Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds

我尝试使用as.matrix来改变数据格式,但还是不行。

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import FloatVector
from rpy2.robjects import pandas2ri
Cubist = importr('Cubist')
lattice = importr('lattice')
r = robjects.r
# 准备样点数据
dt = r('mtcars')
Z = FloatVector(dt[3])
X = FloatVector(dt[5])
X1 = FloatVector(dt[6])
T = r['cbind'](X,X1)

regr = r['cubist'](x=T,y=Z,committees=10)

如果是矩阵,cubist()x 参数似乎需要 dimnames 属性。

R 中的设置:

library(Cubist)

dt = mtcars
Z = dt[, 4]
X = dt[, 6]
X1 = dt[, 7]

现在比较这个(重现你的错误):

> T = cbind(dt[, 6], dt[, 7])
> str(T)
 num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
> cubist(x=T, y=Z, committees=10)
cubist code called exit with value 1
Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds

对比

> T = cbind(X, X1)
> str(T)
 num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "X" "X1"
> cubist(x=T, y=Z, committees=10)

Call:
cubist.default(x = T, y = Z, committees = 10)

Number of samples: 32
Number of predictors: 2

Number of committees: 10
Number of rules per committee: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

有多种方法可以确保通过 rpy2 附加 dimnames。使用您的代码的一种简单方法是简单地显式命名变量:

In [15]: T = r['cbind'](X=X,X1=X1)

In [16]: print(r['str'](T))
 num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "X" "X1"
<rpy2.rinterface.NULLType object at 0x7f0d7c0f5608> [RTYPES.NILSXP]

In [17]: print(r['cubist'](x=T,y=Z,committees=10))

Call:
cubist.default(x = structure(c(2.62, 2.875, 2.32, 3.215, 3.44, 3.46,
 205, 215, 230, 66, 52, 65, 97, 150, 150, 245, 175, 66, 91, 113, 264, 175,
 335, 109), committees = 10L)

Number of samples: 32
Number of predictors: 2

Number of committees: 10
Number of rules per committee: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1