使用 StringBuilder() 将数组解析为 headers 的 CSV——header 行问题
Parsing array into CSV with headers using StringBuilder() -- issue with header row
我有一个标记数据元素的向量,如下所示:
[label1: 1.1, label2: 2.43, label3: 0.5]
[label1: 0.1, label2: 2.0, label3: 1.0]
可以有任意数量的元素,其中每个元素基本上对应一行数据。我正在尝试将其解析为包含 headers 列的 CSV,如下所示:
label1 label2 label3
1.1 2.43 0.5
0.1 2.0 1.0
我一直在使用 StringBuilder()
构造函数并希望坚持使用它,但如果需要我可以使用其他东西。
除了将 headers 与第一行数字结果分开之外,我几乎已经完成了这项工作。
我有一个遍历数组元素的外循环 ("rows") 和一个遍历每个数组元素的每个片段的内循环 ("columns"),在上面的示例中我们有 2 "rows"(元素)和 3 "columns"(成员索引)。
我的代码如下所示(下面的块创建 CSV 并打印到屏幕):
StringBuilder builder = new StringBuilder();
// Write predictions to file
for (int i = 0; i < labeled.size(); i++)
{
// Discreet prediction
double predictionIndex =
clf.classifyInstance(newTest.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
newTest.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
clf.distributionForInstance(newTest.instance(i));
// Print out the true predicted label, and the distribution
System.out.printf("%5d: predicted=%-10s, distribution=",
i, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
newTest.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
if(i == 0){
builder.append(predictionDistributionIndexAsClassLabel+",");
if(predictionDistributionIndex == predictionDistribution.length){
builder.append("\n");
}
}
// Add probabilities as rows
builder.append(predictionProbability+",");
}
System.out.printf("\n");
builder.append("\n");
}
目前出来的结果是这样的:
setosa,1.0,versicolor,0.0,virginica,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
setosa、versicolor 和 virginica 是标签。如您所见,它从第二行开始工作,但我不知道如何修复第一行。
如果我正确理解你的问题,你将在内部 for 循环中同时获取标签和第一行的值,因此在它们出现时附加。如果你想把标签分开,你可以对内部循环部分做一些改变,如下所示:
StringBuilder labelRow = new StringBuilder();
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
newTest.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
if(i == 0){
labelRow.append(predictionDistributionIndexAsClassLabel+",");
if(predictionDistributionIndex == predictionDistribution.length){
builder.append("\n");
}
}
// Add probabilities as rows
builder.append(predictionProbability+",");
}
if(i == 0){
builder.insert(0,labelRow.toString()+"\n");
}
它的作用是将标签收集在单独的 StringBuilder
中,稍后您可以将其插入最终 builder
值的开头。
我有一个标记数据元素的向量,如下所示:
[label1: 1.1, label2: 2.43, label3: 0.5]
[label1: 0.1, label2: 2.0, label3: 1.0]
可以有任意数量的元素,其中每个元素基本上对应一行数据。我正在尝试将其解析为包含 headers 列的 CSV,如下所示:
label1 label2 label3 1.1 2.43 0.5 0.1 2.0 1.0
我一直在使用 StringBuilder()
构造函数并希望坚持使用它,但如果需要我可以使用其他东西。
除了将 headers 与第一行数字结果分开之外,我几乎已经完成了这项工作。
我有一个遍历数组元素的外循环 ("rows") 和一个遍历每个数组元素的每个片段的内循环 ("columns"),在上面的示例中我们有 2 "rows"(元素)和 3 "columns"(成员索引)。
我的代码如下所示(下面的块创建 CSV 并打印到屏幕):
StringBuilder builder = new StringBuilder();
// Write predictions to file
for (int i = 0; i < labeled.size(); i++)
{
// Discreet prediction
double predictionIndex =
clf.classifyInstance(newTest.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
newTest.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
clf.distributionForInstance(newTest.instance(i));
// Print out the true predicted label, and the distribution
System.out.printf("%5d: predicted=%-10s, distribution=",
i, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
newTest.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
if(i == 0){
builder.append(predictionDistributionIndexAsClassLabel+",");
if(predictionDistributionIndex == predictionDistribution.length){
builder.append("\n");
}
}
// Add probabilities as rows
builder.append(predictionProbability+",");
}
System.out.printf("\n");
builder.append("\n");
}
目前出来的结果是这样的:
setosa,1.0,versicolor,0.0,virginica,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
setosa、versicolor 和 virginica 是标签。如您所见,它从第二行开始工作,但我不知道如何修复第一行。
如果我正确理解你的问题,你将在内部 for 循环中同时获取标签和第一行的值,因此在它们出现时附加。如果你想把标签分开,你可以对内部循环部分做一些改变,如下所示:
StringBuilder labelRow = new StringBuilder();
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
newTest.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
if(i == 0){
labelRow.append(predictionDistributionIndexAsClassLabel+",");
if(predictionDistributionIndex == predictionDistribution.length){
builder.append("\n");
}
}
// Add probabilities as rows
builder.append(predictionProbability+",");
}
if(i == 0){
builder.insert(0,labelRow.toString()+"\n");
}
它的作用是将标签收集在单独的 StringBuilder
中,稍后您可以将其插入最终 builder
值的开头。