插入符号 rfe() 错误 "there should be the same number of samples in x and y"
Caret rfe() error "there should be the same number of samples in x and y"
我在解决错误“x 和 y 中应该有相同数量的样本”时遇到困难。我注意到其他人已在此站点上发布了有关此错误的信息,但他们的解决方案对我不起作用。我在这里附上我的数据集的缩写版本。
x_train
在这里:
x_train <- structure(list(laterality = c("Left", "Right", "Right", "Right",
"Left", "Left", "Left", "Left", "Left", "Right"), age = c(66L,
56L, 69L, 49L, 60L, 70L, 58L, 53L, 59L, 64L), insurance = c("MEDICARE",
"UNITED", "MEDICARE", "UNITED", "COMMERCIAL", "MEDICARE", "AETNA",
"AETNA", "OXFORD", "MEDICARE_MANAGED"), employment = c("Retired",
"FullTime", "Retired", "FullTime", "Disabled", "SelfEmployed",
"Retired", "FullTime", "FullTime", "Disabled"), sex = c("Female",
"Male", "Female", "Female", "Female", "Female", "Male", "Male",
"Female", "Male"), race = c("WhiteorCaucasian", "WhiteorCaucasian",
"WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian",
"Other", "BlackorAfricanAmerican", "WhiteorCaucasian", "WhiteorCaucasian"
), ethnicity = c("NotHispanicorLatino", "NotHispanicorLatino",
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino",
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino",
"NotHispanicorLatino", "NotHispanicorLatino"), bmi = c(22.3,
33, 34.3, 36, 30, 20, 29.5, 33.4, 26.5, 34.2), PreferredLanguage = c("English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English"), married = c("Married", "Married",
"Married", "Married", "Married", "Married", "Divorced", "Single",
"Married", "Married"), RadiographSevere = c("No", "No", "No",
"No", "No", "No", "No", "No", "No", "No"), HxAnxietyDepression = c("No",
"No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "No"), SurgeryYear = c(2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), operativetime = c(82L, 79L, 85L, 76L, 84L, 86L, 67L, 75L,
72L, 100L), HipApproach = c("Anterior", "Posterior", "Posterior",
"Posterior", "Posterior", "Anterior", "Posterior", "Posterior",
"Posterior", "Posterior")), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
y_train
在这里:
y_train <- structure(list(POD1AverageNrsScoreCut = c("[0,5)", "[0,5)", "[0,5)",
"[0,5)", "[5,10)", "[0,5)", "[0,5)", "[5,10)", "[0,5)", "[0,5)"
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
))
我用于 rfe 的代码在这里:
library(caret)
control <- rfeControl(functions = rfFuncs, # random forest
method = "repeatedcv", # repeated cv
repeats = 3, # number of repeats
number = 10) # number of folds
result_rfe <- rfe(x = x_train, y = y_train, sizes = c(1:30), rfeControl = control)
我看到你的输出是两个 类 的极限区间。也许如果您尝试将它们作为因素 y = as.factor(unlist(y_train))
?它对我有用
control <- rfeControl(functions = rfFuncs, # random forest
method = "repeatedcv", # repeated cv
repeats = 3, # number of repeats
number = 10) # number of folds
result_rfe <- rfe(x = x_train, y = as.factor(unlist(y_train)), sizes = c(1:30), rfeControl = control)
输出:
>result_rfe
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold, repeated 3 times)
Resampling performance over subset size:
Variables Accuracy Kappa AccuracySD KappaSD Selected
1 0.06667 0 0.2537 0
2 0.06667 0 0.2537 0
3 0.30000 0 0.4661 0
4 0.20000 0 0.4068 0
5 0.36667 0 0.4901 0
6 0.40000 0 0.4983 0
7 0.43333 0 0.5040 0
8 0.53333 0 0.5074 0 *
9 0.30000 0 0.4661 0
10 0.33333 0 0.4795 0
11 0.20000 0 0.4068 0
12 0.26667 0 0.4498 0
13 0.06667 0 0.2537 0
14 0.13333 0 0.3457 0
15 0.20000 0 0.4068 0
The top 5 variables (out of 8):
insurance, laterality, HipApproach, employment, ethnicity
注意:我不知道这是否是你所期望的,我不知道数据上下文和你的方法。
原回答:
Subscript out of bounds error in caret's rfe function
我在解决错误“x 和 y 中应该有相同数量的样本”时遇到困难。我注意到其他人已在此站点上发布了有关此错误的信息,但他们的解决方案对我不起作用。我在这里附上我的数据集的缩写版本。
x_train
在这里:
x_train <- structure(list(laterality = c("Left", "Right", "Right", "Right",
"Left", "Left", "Left", "Left", "Left", "Right"), age = c(66L,
56L, 69L, 49L, 60L, 70L, 58L, 53L, 59L, 64L), insurance = c("MEDICARE",
"UNITED", "MEDICARE", "UNITED", "COMMERCIAL", "MEDICARE", "AETNA",
"AETNA", "OXFORD", "MEDICARE_MANAGED"), employment = c("Retired",
"FullTime", "Retired", "FullTime", "Disabled", "SelfEmployed",
"Retired", "FullTime", "FullTime", "Disabled"), sex = c("Female",
"Male", "Female", "Female", "Female", "Female", "Male", "Male",
"Female", "Male"), race = c("WhiteorCaucasian", "WhiteorCaucasian",
"WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian",
"Other", "BlackorAfricanAmerican", "WhiteorCaucasian", "WhiteorCaucasian"
), ethnicity = c("NotHispanicorLatino", "NotHispanicorLatino",
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino",
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino",
"NotHispanicorLatino", "NotHispanicorLatino"), bmi = c(22.3,
33, 34.3, 36, 30, 20, 29.5, 33.4, 26.5, 34.2), PreferredLanguage = c("English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English"), married = c("Married", "Married",
"Married", "Married", "Married", "Married", "Divorced", "Single",
"Married", "Married"), RadiographSevere = c("No", "No", "No",
"No", "No", "No", "No", "No", "No", "No"), HxAnxietyDepression = c("No",
"No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "No"), SurgeryYear = c(2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), operativetime = c(82L, 79L, 85L, 76L, 84L, 86L, 67L, 75L,
72L, 100L), HipApproach = c("Anterior", "Posterior", "Posterior",
"Posterior", "Posterior", "Anterior", "Posterior", "Posterior",
"Posterior", "Posterior")), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
y_train
在这里:
y_train <- structure(list(POD1AverageNrsScoreCut = c("[0,5)", "[0,5)", "[0,5)",
"[0,5)", "[5,10)", "[0,5)", "[0,5)", "[5,10)", "[0,5)", "[0,5)"
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
))
我用于 rfe 的代码在这里:
library(caret)
control <- rfeControl(functions = rfFuncs, # random forest
method = "repeatedcv", # repeated cv
repeats = 3, # number of repeats
number = 10) # number of folds
result_rfe <- rfe(x = x_train, y = y_train, sizes = c(1:30), rfeControl = control)
我看到你的输出是两个 类 的极限区间。也许如果您尝试将它们作为因素 y = as.factor(unlist(y_train))
?它对我有用
control <- rfeControl(functions = rfFuncs, # random forest
method = "repeatedcv", # repeated cv
repeats = 3, # number of repeats
number = 10) # number of folds
result_rfe <- rfe(x = x_train, y = as.factor(unlist(y_train)), sizes = c(1:30), rfeControl = control)
输出:
>result_rfe
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold, repeated 3 times)
Resampling performance over subset size:
Variables Accuracy Kappa AccuracySD KappaSD Selected
1 0.06667 0 0.2537 0
2 0.06667 0 0.2537 0
3 0.30000 0 0.4661 0
4 0.20000 0 0.4068 0
5 0.36667 0 0.4901 0
6 0.40000 0 0.4983 0
7 0.43333 0 0.5040 0
8 0.53333 0 0.5074 0 *
9 0.30000 0 0.4661 0
10 0.33333 0 0.4795 0
11 0.20000 0 0.4068 0
12 0.26667 0 0.4498 0
13 0.06667 0 0.2537 0
14 0.13333 0 0.3457 0
15 0.20000 0 0.4068 0
The top 5 variables (out of 8):
insurance, laterality, HipApproach, employment, ethnicity
注意:我不知道这是否是你所期望的,我不知道数据上下文和你的方法。
原回答: Subscript out of bounds error in caret's rfe function