table 中实际值和预测值的长度有何不同?出现错误 - 所有参数必须具有相同的长度
How is the length of actual and predicted value different in table? Getting error- all arguments must have the same length
这是我的逻辑回归代码:
data_raw = data.frame(
Var1= c(11, 5, 1, 0, 5, 1, 0, 0, 1, 0),
Var2= c(11, 5, 0, 0, 2, 1, 0, 2, 0, 2),
Var3= c(10, 7, 15, 9, 16, 9, 13, 15, 11, 17),
Var4= c(6, 10, 36, 10, 9, 12, 17, 5, 12, 14),
Var5= c(7, 26, 24, 16, 23, 25, 15, 10, 15, 22),
Var6= c(0, 0, 1, 0, 0, 2, 1, 0, 0, 0),
Var7= c(17, 21, 23, 16, 26, 22, 11, 9, 9, 9),
Var8= c(1, 0, 1, 0, 3, 5, 2, 0, 0, 0),
Var9= c(3, 0, 3, 3, 2, 0, 1, 3, 3, 2),
Var10= c(3, 0, 3, 3, 2, 0, 1, 3, 3, 2),
Var11= c(7, 2, 6, 7, 7, 5, 3, 5, 5, 4),
Var12= c(4, 3, 3, 4, 2, 3, 4, 8, 7, 5),
Var13= c("Summer", "Summer", "Summer", "Summer", "Autumn", "Autumn", "Summer", "Summer", "Summer", "Summer"),
Var14= c("Both host", "Both host", "Both host", "Host Visitor", "Both host", "Both host", "Both host", "Both host", "Both host", "Both host"),
Var15= c("Home", "Similar", "Similar", "Similar", "Similar", "Similar", "Home", "Home", "Home", "Similar"),
Winner = c("Win", "Loss", "Win", "Loss", "Win", "Win", "Loss", "Loss", "Win", "Loss"),
stringsAsFactors = TRUE
)
set.seed(123) # <-- edit
data_shuffled = sample(1:nrow(data_raw))
data_new = data_raw[data_shuffled, ]
create_train_test <- function(data_new, size = 0.8, train = TRUE) {
n_row = nrow(data_new)
total_row = size * n_row
train_sample = 1: total_row
if (train == TRUE) {
return (data_new[train_sample, ])
} else {
return (data_new[-train_sample, ])
}
}
data_train <- create_train_test(data_new, size= 0.8, train = TRUE)
data_test <- create_train_test(data_new, size= 0.8, train = FALSE)
mymodel = glm(Winner~., data= data_train, family= binomial)
res2 = predict(mymodel, data= data_test, type="response")
pred2= ifelse(res2>0.5, 1, 0)
tab2= table(data_test$Winner, pred2)
在最终代码中,我收到一条错误消息,要求所有参数的长度都应相同。经过检查,我发现它们确实有不同的长度。为什么会发生顺便说一句?我使用了不同的数据集并且工作正常。
编辑。我已经包含了一个示例数据集。
当您使用 predict
时,您必须使用 newdata
而不是 data
才能使用新数据集。
res2 <- predict(mymodel, newdata=data_test, type="response")
这是我的逻辑回归代码:
data_raw = data.frame(
Var1= c(11, 5, 1, 0, 5, 1, 0, 0, 1, 0),
Var2= c(11, 5, 0, 0, 2, 1, 0, 2, 0, 2),
Var3= c(10, 7, 15, 9, 16, 9, 13, 15, 11, 17),
Var4= c(6, 10, 36, 10, 9, 12, 17, 5, 12, 14),
Var5= c(7, 26, 24, 16, 23, 25, 15, 10, 15, 22),
Var6= c(0, 0, 1, 0, 0, 2, 1, 0, 0, 0),
Var7= c(17, 21, 23, 16, 26, 22, 11, 9, 9, 9),
Var8= c(1, 0, 1, 0, 3, 5, 2, 0, 0, 0),
Var9= c(3, 0, 3, 3, 2, 0, 1, 3, 3, 2),
Var10= c(3, 0, 3, 3, 2, 0, 1, 3, 3, 2),
Var11= c(7, 2, 6, 7, 7, 5, 3, 5, 5, 4),
Var12= c(4, 3, 3, 4, 2, 3, 4, 8, 7, 5),
Var13= c("Summer", "Summer", "Summer", "Summer", "Autumn", "Autumn", "Summer", "Summer", "Summer", "Summer"),
Var14= c("Both host", "Both host", "Both host", "Host Visitor", "Both host", "Both host", "Both host", "Both host", "Both host", "Both host"),
Var15= c("Home", "Similar", "Similar", "Similar", "Similar", "Similar", "Home", "Home", "Home", "Similar"),
Winner = c("Win", "Loss", "Win", "Loss", "Win", "Win", "Loss", "Loss", "Win", "Loss"),
stringsAsFactors = TRUE
)
set.seed(123) # <-- edit
data_shuffled = sample(1:nrow(data_raw))
data_new = data_raw[data_shuffled, ]
create_train_test <- function(data_new, size = 0.8, train = TRUE) {
n_row = nrow(data_new)
total_row = size * n_row
train_sample = 1: total_row
if (train == TRUE) {
return (data_new[train_sample, ])
} else {
return (data_new[-train_sample, ])
}
}
data_train <- create_train_test(data_new, size= 0.8, train = TRUE)
data_test <- create_train_test(data_new, size= 0.8, train = FALSE)
mymodel = glm(Winner~., data= data_train, family= binomial)
res2 = predict(mymodel, data= data_test, type="response")
pred2= ifelse(res2>0.5, 1, 0)
tab2= table(data_test$Winner, pred2)
在最终代码中,我收到一条错误消息,要求所有参数的长度都应相同。经过检查,我发现它们确实有不同的长度。为什么会发生顺便说一句?我使用了不同的数据集并且工作正常。 编辑。我已经包含了一个示例数据集。
当您使用 predict
时,您必须使用 newdata
而不是 data
才能使用新数据集。
res2 <- predict(mymodel, newdata=data_test, type="response")