在 Spark 中正确输出矩阵 Java
Output matrix correctly in Spark Java
我想知道如何获得正确的输出,我希望输出与输入具有相同的格式。我只是不太确定如何映射 rowNatrix
以获得此输出。
输入文件
0,0,0.0
0,1,1.0
0,2,2.0
0,3,3.0
0,4,4.0
1,0,5.0
1,1,6.0
1,2,7.0
1,3,8.0
1,4,9.0
代码
String inputPathA = "data/At.txt";
SparkConf conf = new SparkConf().setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> fileA = sc.textFile(inputPathA);
JavaRDD<MatrixEntry> matrixA = fileA.map(new Function<String, MatrixEntry>() {
public MatrixEntry call(String x){
String[] indeceValue = x.split(",");
long i = Long.parseLong(indeceValue[0]);
long j = Long.parseLong(indeceValue[1]);
double value = Double.parseDouble(indeceValue[2]);
return new MatrixEntry(i, j, value );
}
});
CoordinateMatrix cooMatrixA = new CoordinateMatrix(matrixA.rdd());
BlockMatrix matA = cooMatrixA.toBlockMatrix();
BlockMatrix ata = matA.transpose().multiply(matA);
IndexedRowMatrix id = ata.toIndexedRowMatrix();
RowMatrix rm = id.toRowMatrix();
RDD<Vector> result = rm.rows();
result.saveAsTextFile("data/output1")
我得到的输出
(5,[0,1,2,3,4],[45.0,58.0,71.0,84.0,97.0])
(5,[0,1,2,3,4],[25.0,30.0,35.0,40.0,45.0])
(5,[0,1,2,3,4],[30.0,37.0,44.0,51.0,58.0])
(5,[0,1,2,3,4],[40.0,51.0,62.0,73.0,84.0])
(5,[0,1,2,3,4],[35.0,44.0,53.0,62.0,71.0])
如何在 Spark (Java) 中将其正确映射为与我的输入相同?
rowMatrix
没有有意义的行索引,因此无法将其转换回与输入相同的形状。相反,您只需将 BlockMatrix
转换回 CoordinateMatrix
并准备可以保存的 JavaRDD<String>
:
JavaRDD<MatrixEntry> entries = ata.toCoordinateMatrix().entries().toJavaRDD();
JavaRDD<String> output = entries.map(new Function<MatrixEntry, String>() {
public String call(MatrixEntry e) {
return String.format("%d,%d,%s", e.i(), e.j(), e.value());
}
});
output.saveAsTextFile("data/output1");
我想知道如何获得正确的输出,我希望输出与输入具有相同的格式。我只是不太确定如何映射 rowNatrix
以获得此输出。
输入文件
0,0,0.0
0,1,1.0
0,2,2.0
0,3,3.0
0,4,4.0
1,0,5.0
1,1,6.0
1,2,7.0
1,3,8.0
1,4,9.0
代码
String inputPathA = "data/At.txt";
SparkConf conf = new SparkConf().setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> fileA = sc.textFile(inputPathA);
JavaRDD<MatrixEntry> matrixA = fileA.map(new Function<String, MatrixEntry>() {
public MatrixEntry call(String x){
String[] indeceValue = x.split(",");
long i = Long.parseLong(indeceValue[0]);
long j = Long.parseLong(indeceValue[1]);
double value = Double.parseDouble(indeceValue[2]);
return new MatrixEntry(i, j, value );
}
});
CoordinateMatrix cooMatrixA = new CoordinateMatrix(matrixA.rdd());
BlockMatrix matA = cooMatrixA.toBlockMatrix();
BlockMatrix ata = matA.transpose().multiply(matA);
IndexedRowMatrix id = ata.toIndexedRowMatrix();
RowMatrix rm = id.toRowMatrix();
RDD<Vector> result = rm.rows();
result.saveAsTextFile("data/output1")
我得到的输出
(5,[0,1,2,3,4],[45.0,58.0,71.0,84.0,97.0])
(5,[0,1,2,3,4],[25.0,30.0,35.0,40.0,45.0])
(5,[0,1,2,3,4],[30.0,37.0,44.0,51.0,58.0])
(5,[0,1,2,3,4],[40.0,51.0,62.0,73.0,84.0])
(5,[0,1,2,3,4],[35.0,44.0,53.0,62.0,71.0])
如何在 Spark (Java) 中将其正确映射为与我的输入相同?
rowMatrix
没有有意义的行索引,因此无法将其转换回与输入相同的形状。相反,您只需将 BlockMatrix
转换回 CoordinateMatrix
并准备可以保存的 JavaRDD<String>
:
JavaRDD<MatrixEntry> entries = ata.toCoordinateMatrix().entries().toJavaRDD();
JavaRDD<String> output = entries.map(new Function<MatrixEntry, String>() {
public String call(MatrixEntry e) {
return String.format("%d,%d,%s", e.i(), e.j(), e.value());
}
});
output.saveAsTextFile("data/output1");