caret::predict giving Error: $ operator is invalid for atomic vectors
caret::predict giving Error: $ operator is invalid for atomic vectors
这让我发疯了,我整天都在浏览类似的帖子,但似乎无法解决我的问题。我有一个经过训练并存储为 model
的朴素贝叶斯模型。我正在尝试使用 newdata
数据框进行预测,但我一直收到错误 Error: $ operator is invalid for atomic vectors
。这是我的 运行:stats::predict(model, newdata = newdata)
其中 newdata
是另一个数据框的第一行:new data <- pbp[1, c("balls", "strikes", "outs_when_up", "stand", "pitcher", "p_throws", "inning")]
class(newdata)
给出 [1] "tbl_df" "tbl" "data.frame"
.
问题出在所使用的数据上。它应该与训练中使用的 levels
匹配。例如。如果我们使用从 trainingData 到 predict
的其中一行,它确实有效
predict(model, head(model$trainingData, 1))
#[1] Curveball
#Levels: Changeup Curveball Fastball Sinker Slider
通过检查两个数据集的str
,训练中的某些factor
列是character
class
str(model$trainingData)
'data.frame': 1277525 obs. of 7 variables:
$ pitcher : Factor w/ 1390 levels "112526","115629",..: 277 277 277 277 277 277 277 277 277 277 ...
$ stand : Factor w/ 2 levels "L","R": 1 1 2 2 2 2 2 1 1 1 ...
$ p_throws : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
$ balls : num 0 1 0 1 2 2 2 0 0 0 ...
$ strikes : num 0 0 0 0 0 1 2 0 1 2 ...
$ outs_when_up: num 1 1 1 1 1 1 1 2 2 2 ...
$ .outcome : Factor w/ 5 levels "Changeup","Curveball",..: 3 4 1 4 1 5 5 1 1 5 ...
str(newdata)
tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
$ balls : int 3
$ strikes : int 2
$ outs_when_up: int 1
$ stand : chr "R"
$ pitcher : int 605200
$ p_throws : chr "R"
一个选项是使 levels
与 factor
class
相同
nm1 <- intersect(names(model$trainingData), names(newdata))
nm2 <- names(which(sapply(model$trainingData[nm1], is.factor)))
newdata[nm2] <- Map(function(x, y) factor(x, levels = levels(y)), newdata[nm2], model$trainingData[nm2])
现在做 predict
ion
predict(model, newdata)
#[1] Sinker
#Levels: Changeup Curveball Fastball Sinker Slider
这让我发疯了,我整天都在浏览类似的帖子,但似乎无法解决我的问题。我有一个经过训练并存储为 model
的朴素贝叶斯模型。我正在尝试使用 newdata
数据框进行预测,但我一直收到错误 Error: $ operator is invalid for atomic vectors
。这是我的 运行:stats::predict(model, newdata = newdata)
其中 newdata
是另一个数据框的第一行:new data <- pbp[1, c("balls", "strikes", "outs_when_up", "stand", "pitcher", "p_throws", "inning")]
class(newdata)
给出 [1] "tbl_df" "tbl" "data.frame"
.
问题出在所使用的数据上。它应该与训练中使用的 levels
匹配。例如。如果我们使用从 trainingData 到 predict
的其中一行,它确实有效
predict(model, head(model$trainingData, 1))
#[1] Curveball
#Levels: Changeup Curveball Fastball Sinker Slider
通过检查两个数据集的str
,训练中的某些factor
列是character
class
str(model$trainingData)
'data.frame': 1277525 obs. of 7 variables:
$ pitcher : Factor w/ 1390 levels "112526","115629",..: 277 277 277 277 277 277 277 277 277 277 ...
$ stand : Factor w/ 2 levels "L","R": 1 1 2 2 2 2 2 1 1 1 ...
$ p_throws : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
$ balls : num 0 1 0 1 2 2 2 0 0 0 ...
$ strikes : num 0 0 0 0 0 1 2 0 1 2 ...
$ outs_when_up: num 1 1 1 1 1 1 1 2 2 2 ...
$ .outcome : Factor w/ 5 levels "Changeup","Curveball",..: 3 4 1 4 1 5 5 1 1 5 ...
str(newdata)
tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
$ balls : int 3
$ strikes : int 2
$ outs_when_up: int 1
$ stand : chr "R"
$ pitcher : int 605200
$ p_throws : chr "R"
一个选项是使 levels
与 factor
class
nm1 <- intersect(names(model$trainingData), names(newdata))
nm2 <- names(which(sapply(model$trainingData[nm1], is.factor)))
newdata[nm2] <- Map(function(x, y) factor(x, levels = levels(y)), newdata[nm2], model$trainingData[nm2])
现在做 predict
ion
predict(model, newdata)
#[1] Sinker
#Levels: Changeup Curveball Fastball Sinker Slider