Knn 中的错误,'train' 和 'class' 的长度不同 - R 代码
Error in Knn, 'train' and 'class' have different lengths - R Code
我正在尝试在我的数据集上使用 knn,它有 65499 行和 6 列
我的数据集:
> dput(head(sampleknn))
structure(list(RequestorSeniority = c(1L, 2L, 2L, 4L, 1L, 4L),
ITOwner = c(50L, 15L, 15L, 22L, 22L, 38L), Severity = c(2L,
1L, 2L, 2L, 2L, 2L), Priority = c(0L, 1L, 0L, 0L, 1L, 3L),
daysOpen = c(3L, 5L, 0L, 20L, 1L, 0L), Satisfaction = structure(c(4L,
4L, 3L, 3L, 4L, 3L), .Label = c("Amazing", "Satisfied", "Unknown",
"Unsatisfied"), class = "factor")), .Names = c("RequestorSeniority",
"ITOwner", "Severity", "Priority", "daysOpen", "Satisfaction"
), row.names = c(NA, 6L), class = "data.frame")
>str(sampleknn)
'data.frame': 65499 obs. of 6 variables:
$ RequestorSeniority: int 1 2 2 4 1 4 3 4 2 3 ...
$ ITOwner : int 50 15 15 22 22 38 10 1 14 46 ...
$ Severity : int 2 1 2 2 2 2 2 2 2 2 ...
$ Priority : int 0 1 0 0 1 3 3 0 2 1 ...
$ daysOpen : int 3 5 0 20 1 0 9 15 6 1 ...
$ Satisfaction : Factor w/ 4 levels "Amazing","Satisfied",..: 4 4 3 3 4 3 3 3 4 4 ...
现在我尝试在这个数据集上使用 knn(下面的代码),它给了我以下错误:
Error in knn(train = sampleknn_train, test = sampleknn_test, cl =
sampleknn_test_target, : 'train' and 'class' have different
lengths
代码:
sampleknn <- read.csv(file="HelpDesk.csv",head=TRUE,sep=",")
str(sampleknn)
#---scaling
normalize <- function(x) {
return((x-min(x))/(max(x)-min(x)))
}
sampleknn_n <- as.data.frame(lapply(sampleknn[ ,c(1,2,3,4,5)], normalize))
str(sampleknn_n)
#train the data from sampleknn_n
sampleknn_train <- sampleknn_n[1:65000, ]
#create a test dataset
sampleknn_test <- sampleknn_n[65001:65499, ]
#isolate test and train satisfaction levels
sampleknn_train_target <- sampleknn[1:65000, 6]
sampleknn_test_target <- sampleknn[65001:65499, 6]
#-----knn model
sqrt(65499)
m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_test_target,k=255)
现在,当我 运行 最后一行 (m1 <-...) 时,它给我错误 'train' 和 'class' 有不同的长度。我尝试寻找谈论同一问题的答案,但似乎对我没有任何帮助。此问题的修复方法是什么?如果您需要更多信息,请告诉我。
编辑:
标准化前:
RequestorSeniority ITOwner Severity Priority daysOpen Satisfaction
1 50 2 0 3 Unsatisfied
2 15 1 1 5 Unsatisfied
2 15 2 0 0 Unknown
4 22 2 0 20 Unknown
1 22 2 1 1 Unsatisfied
4 38 2 3 0 Unknown
标准化后:
RequestorSeniority ITOwner Severity Priority daysOpen
0.0000000000 1.0000000000 0.50 0.0000000000 0.05555555556
0.3333333333 0.2857142857 0.25 0.3333333333 0.09259259259
0.3333333333 0.2857142857 0.50 0.0000000000 0.00000000000
1.0000000000 0.4285714286 0.50 0.0000000000 0.37037037037
0.0000000000 0.4285714286 0.50 0.3333333333 0.01851851852
1.0000000000 0.7551020408 0.50 1.0000000000 0.00000000000
> dput(head(sampleknn_n))
structure(list(RequestorSeniority = c(0, 0.333333333333333, 0.333333333333333,
1, 0, 1), ITOwner = c(1, 0.285714285714286, 0.285714285714286,
0.428571428571429, 0.428571428571429, 0.755102040816326), Severity = c(0.5,
0.25, 0.5, 0.5, 0.5, 0.5), Priority = c(0, 0.333333333333333,
0, 0, 0.333333333333333, 1), daysOpen = c(0.0555555555555556,
0.0925925925925926, 0, 0.37037037037037, 0.0185185185185185,
0)), .Names = c("RequestorSeniority", "ITOwner", "Severity",
"Priority", "daysOpen"), row.names = c(NA, 6L), class = "data.frame")
来自?knn
:
cl factor of true classifications of training set
因此你应该写下你的声明:
m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_train_target,k=255)
我正在尝试在我的数据集上使用 knn,它有 65499 行和 6 列
我的数据集:
> dput(head(sampleknn))
structure(list(RequestorSeniority = c(1L, 2L, 2L, 4L, 1L, 4L),
ITOwner = c(50L, 15L, 15L, 22L, 22L, 38L), Severity = c(2L,
1L, 2L, 2L, 2L, 2L), Priority = c(0L, 1L, 0L, 0L, 1L, 3L),
daysOpen = c(3L, 5L, 0L, 20L, 1L, 0L), Satisfaction = structure(c(4L,
4L, 3L, 3L, 4L, 3L), .Label = c("Amazing", "Satisfied", "Unknown",
"Unsatisfied"), class = "factor")), .Names = c("RequestorSeniority",
"ITOwner", "Severity", "Priority", "daysOpen", "Satisfaction"
), row.names = c(NA, 6L), class = "data.frame")
>str(sampleknn)
'data.frame': 65499 obs. of 6 variables:
$ RequestorSeniority: int 1 2 2 4 1 4 3 4 2 3 ...
$ ITOwner : int 50 15 15 22 22 38 10 1 14 46 ...
$ Severity : int 2 1 2 2 2 2 2 2 2 2 ...
$ Priority : int 0 1 0 0 1 3 3 0 2 1 ...
$ daysOpen : int 3 5 0 20 1 0 9 15 6 1 ...
$ Satisfaction : Factor w/ 4 levels "Amazing","Satisfied",..: 4 4 3 3 4 3 3 3 4 4 ...
现在我尝试在这个数据集上使用 knn(下面的代码),它给了我以下错误:
Error in knn(train = sampleknn_train, test = sampleknn_test, cl = sampleknn_test_target, : 'train' and 'class' have different lengths
代码:
sampleknn <- read.csv(file="HelpDesk.csv",head=TRUE,sep=",")
str(sampleknn)
#---scaling
normalize <- function(x) {
return((x-min(x))/(max(x)-min(x)))
}
sampleknn_n <- as.data.frame(lapply(sampleknn[ ,c(1,2,3,4,5)], normalize))
str(sampleknn_n)
#train the data from sampleknn_n
sampleknn_train <- sampleknn_n[1:65000, ]
#create a test dataset
sampleknn_test <- sampleknn_n[65001:65499, ]
#isolate test and train satisfaction levels
sampleknn_train_target <- sampleknn[1:65000, 6]
sampleknn_test_target <- sampleknn[65001:65499, 6]
#-----knn model
sqrt(65499)
m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_test_target,k=255)
现在,当我 运行 最后一行 (m1 <-...) 时,它给我错误 'train' 和 'class' 有不同的长度。我尝试寻找谈论同一问题的答案,但似乎对我没有任何帮助。此问题的修复方法是什么?如果您需要更多信息,请告诉我。
编辑:
标准化前:
RequestorSeniority ITOwner Severity Priority daysOpen Satisfaction
1 50 2 0 3 Unsatisfied
2 15 1 1 5 Unsatisfied
2 15 2 0 0 Unknown
4 22 2 0 20 Unknown
1 22 2 1 1 Unsatisfied
4 38 2 3 0 Unknown
标准化后:
RequestorSeniority ITOwner Severity Priority daysOpen
0.0000000000 1.0000000000 0.50 0.0000000000 0.05555555556
0.3333333333 0.2857142857 0.25 0.3333333333 0.09259259259
0.3333333333 0.2857142857 0.50 0.0000000000 0.00000000000
1.0000000000 0.4285714286 0.50 0.0000000000 0.37037037037
0.0000000000 0.4285714286 0.50 0.3333333333 0.01851851852
1.0000000000 0.7551020408 0.50 1.0000000000 0.00000000000
> dput(head(sampleknn_n))
structure(list(RequestorSeniority = c(0, 0.333333333333333, 0.333333333333333,
1, 0, 1), ITOwner = c(1, 0.285714285714286, 0.285714285714286,
0.428571428571429, 0.428571428571429, 0.755102040816326), Severity = c(0.5,
0.25, 0.5, 0.5, 0.5, 0.5), Priority = c(0, 0.333333333333333,
0, 0, 0.333333333333333, 1), daysOpen = c(0.0555555555555556,
0.0925925925925926, 0, 0.37037037037037, 0.0185185185185185,
0)), .Names = c("RequestorSeniority", "ITOwner", "Severity",
"Priority", "daysOpen"), row.names = c(NA, 6L), class = "data.frame")
来自?knn
:
cl factor of true classifications of training set
因此你应该写下你的声明:
m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_train_target,k=255)