是否可以使用数字属性作为 class 进行 K-means 聚类？

Question

@attribute CustomerID       NUMERIC
@attribute Age              {A,B,C,D,E,F,G,H,I,J,K}
@attribute Region           {A,B,C,D,E,F,G,H}
@attribute ProductSubClass  NUMERIC
@attribute ProductID        NUMERIC 
@attribute Quantity         NUMERIC
@attribute Cost             NUMERIC
@attribute sales            NUMERIC

@data
00141833,F,F,130207,4710105011011,2,44,52
01376753,E,E,110217,4710265849066,1,150,129
01603071,E,G,100201,4712019100607,1,35,39
01738667,E,F,530105,4710168702901,1,94,119

以上是header和一部分试验数据集training.arff文件我想使用 Kmeans 聚类和 J48 classifier，我可以毫无问题地做到这一点。流动的是我的测试数据集 test.arff

@attribute CustomerID       NUMERIC
@attribute Age              {A,B,C,D,E,F,G,H,I,J,K}
@attribute Region           {A,B,C,D,E,F,G,H}
@attribute ProductSubClass  NUMERIC
@attribute ProductID        INTEGER
@attribute Quantity         NUMERIC
@attribute Cost             NUMERIC
@attribute sales            NUMERIC

@data
1754698,H,A,560402,?,1,676,849
1027365,F,C,530404,?,1,170,219
956710,E,E,500303,?,1,36,59

在这两种情况下，我都确保将 ProductID 选择为 Class

这是我做的步骤

Setp1: assigning "AddCluster" to use K-means clusterig for each instance in the dataset 
step2: and then using J48 classificaion algorithm to evaluate the performance of the clustering algorithms using 10-fold cross validation option 
Step3: save Finalized Model and close weka (I am closing to test if I can relode and use it agian)
Step4:Load the Model in weaka (Useing "Load Model")
step5: This time I am selecting "supplied test set"  and select test file to predict (which is same formate as I mentioned in the questien above)
step6: I am trying "Re-evaluate model on  current test set"

但我收到通知“用于训练模式测试集的数据不是 compatible.would 你喜欢自动包装 classifier 在 "inputMappedClassifier before proceeding ?"" 如果我点击 "NO" 它显示 "Train and test set are not compatible ... 5 != 6" 如果 "YES" 它在纯文本中给出以下输出：

=== Predictions on user test set ===

    inst#     actual  predicted error prediction
        1          ?      0              ? 
        2          ?      0              ? 
        3          ?      0              ? 
        4          ?      0              ? 
        5          ?      0              ? 
        6          ?      0              ? 
        7          ?      0              ? 
        8          ?      0              ? 
        9          ?      0              ? 
       10          ?      0              ? 
       11          ?      0              ? 
       12          ?      0              ? 
       13          ?      0              ? 
       14          ?      0              ? 
       15          ?      0              ? 
       16          ?      1              ? 
       17          ?      0              ? 
       18          ?      0              ? 
       19          ?      0              ? 
       20          ?      0              ? 
       21          ?      0              ?

现在 1. 是否可以将数字字段 ProductID 用作 class，因为我必须在考虑其他属性的情况下根据 ProductID 预测客户对产品的选择。

如果是这样，我遇到了另一个问题训练集和测试集不兼容这个错误与选择数字属性有什么联系吗？

注意：我使用的是 Weka 3.8.1 GUI

Answer 1

可能，您的测试数据集缺少 K-Means 聚类操作可能添加到训练集中的 cluster-id（您是否告诉 Weka do so?), 但没有加入测试数据集。

除此之外，K-Means 的全部意义在于将其用于聚类而不是分类。

所以坦率地说，你应用不正确，没有给我们读者足够的信息（J48？），并且在这里问（至少）两个问题。

是否可以使用数字属性作为 class 进行 K-means 聚类？

Is it possible using numeric attribute as class for K-means clustering?

classification

cluster-analysis

weka

data-mining