使用 StringBuilder() 将数组解析为 headers 的 CSV——header 行问题

Parsing array into CSV with headers using StringBuilder() -- issue with header row

我有一个标记数据元素的向量,如下所示:

[label1: 1.1, label2: 2.43, label3: 0.5]

[label1: 0.1, label2: 2.0, label3: 1.0]

可以有任意数量的元素,其中每个元素基本上对应一行数据。我正在尝试将其解析为包含 headers 列的 CSV,如下所示:

label1 label2 label3
1.1    2.43   0.5
0.1    2.0    1.0

我一直在使用 StringBuilder() 构造函数并希望坚持使用它,但如果需要我可以使用其他东西。

除了将 headers 与第一行数字结果分开之外,我几乎已经完成了这项工作。

我有一个遍历数组元素的外循环 ("rows") 和一个遍历每个数组元素的每个片段的内循环 ("columns"),在上面的示例中我们有 2 "rows"(元素)和 3 "columns"(成员索引)。

我的代码如下所示(下面的块创建 CSV 并打印到屏幕):

StringBuilder builder  = new StringBuilder();

// Write predictions to file
for (int i = 0; i < labeled.size(); i++)      
{
    // Discreet prediction
    double predictionIndex = 
        clf.classifyInstance(newTest.instance(i)); 

    // Get the predicted class label from the predictionIndex.
    String predictedClassLabel =
        newTest.classAttribute().value((int) predictionIndex);

    // Get the prediction probability distribution.
    double[] predictionDistribution = 
        clf.distributionForInstance(newTest.instance(i)); 

    // Print out the true predicted label, and the distribution
    System.out.printf("%5d: predicted=%-10s, distribution=", 
                      i, predictedClassLabel); 

    // Loop over all the prediction labels in the distribution.
    for (int predictionDistributionIndex = 0; 
         predictionDistributionIndex < predictionDistribution.length; 
         predictionDistributionIndex++)
    {
        // Get this distribution index's class label.
        String predictionDistributionIndexAsClassLabel = 
            newTest.classAttribute().value(
                predictionDistributionIndex);

        // Get the probability.
        double predictionProbability = 
            predictionDistribution[predictionDistributionIndex];

        System.out.printf("[%10s : %6.3f]", 
                          predictionDistributionIndexAsClassLabel, 
                          predictionProbability );
        if(i == 0){
            builder.append(predictionDistributionIndexAsClassLabel+",");

            if(predictionDistributionIndex == predictionDistribution.length){
                builder.append("\n");
            }
        }
        // Add probabilities as rows     
        builder.append(predictionProbability+",");

        }

    System.out.printf("\n");
    builder.append("\n");

}

目前出来的结果是这样的:

setosa,1.0,versicolor,0.0,virginica,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,

setosa、versicolor 和 virginica 是标签。如您所见,它从第二行开始工作,但我不知道如何修复第一行。

如果我正确理解你的问题,你将在内部 for 循环中同时获取标签和第一行的值,因此在它们出现时附加。如果你想把标签分开,你可以对内部循环部分做一些改变,如下所示:

StringBuilder labelRow = new StringBuilder();

    // Loop over all the prediction labels in the distribution.
    for (int predictionDistributionIndex = 0; 
         predictionDistributionIndex < predictionDistribution.length; 
         predictionDistributionIndex++)
    {
        // Get this distribution index's class label.
        String predictionDistributionIndexAsClassLabel = 
            newTest.classAttribute().value(
                predictionDistributionIndex);

        // Get the probability.
        double predictionProbability = 
            predictionDistribution[predictionDistributionIndex];

        System.out.printf("[%10s : %6.3f]", 
                          predictionDistributionIndexAsClassLabel, 
                          predictionProbability );
        if(i == 0){
            labelRow.append(predictionDistributionIndexAsClassLabel+",");

            if(predictionDistributionIndex == predictionDistribution.length){
                builder.append("\n");
            }

        }

        // Add probabilities as rows     
        builder.append(predictionProbability+",");

     }
     if(i == 0){
          builder.insert(0,labelRow.toString()+"\n");
     }

它的作用是将标签收集在单独的 StringBuilder 中,稍后您可以将其插入最终 builder 值的开头。