如何更新 Apache Spark DataFrame 中的 Row/column 值?
How to update Row/column value in a Apache Spark DataFrame?
我有一个有序的 Spark DataFrame
我想在使用以下代码迭代它时更改几行,但似乎没有任何方法可以更新 Row 对象。
orderedDataFrame.foreach(new Function1<Row,BoxedUnit>(){
@Override
public BoxedUnit apply(Row v1) {
// How do I change Row here?
// I want to change column no 2 using v1.get(2)
// also what is BoxedUnit, and how do I use it
return null;
}
});
此外,上面的代码给出了编译错误:
myclassname is not abstract and it does not override abstract method apply$mcVj$sp(long) in scala Function 1
我是 Spark 的新手。我正在使用 1.4.0 版本。
试试这个:
final DataFrame withoutCurrency = sqlContext.createDataFrame(somedf.javaRDD().map(row -> {
return RowFactory.create(row.get(0), row.get(1), someMethod(row.get(2)));
}), somedf.schema());
Dataset<Row> ds = spark.createDataFrame(Collections.singletonList(data), SellerAsinAttribute.class);
ds.map((i)-> {
Object arrayObj = Array.newInstance(Object.class, i.length());
for (int n = 0; n < i.length(); ++n) {
Array.set(arrayObj, n, i.get(n));//change 'i.get(n)' to anything you want, if you change type, remember to update schema
}
Method create = RowFactory.class.getMethod("create", Object[].class);
return (Row) create.invoke(null, arrayObj);
}, RowEncoder.apply(ds.schema())).show();
我有一个有序的 Spark DataFrame
我想在使用以下代码迭代它时更改几行,但似乎没有任何方法可以更新 Row 对象。
orderedDataFrame.foreach(new Function1<Row,BoxedUnit>(){
@Override
public BoxedUnit apply(Row v1) {
// How do I change Row here?
// I want to change column no 2 using v1.get(2)
// also what is BoxedUnit, and how do I use it
return null;
}
});
此外,上面的代码给出了编译错误:
myclassname is not abstract and it does not override abstract method apply$mcVj$sp(long) in scala Function 1
我是 Spark 的新手。我正在使用 1.4.0 版本。
试试这个:
final DataFrame withoutCurrency = sqlContext.createDataFrame(somedf.javaRDD().map(row -> {
return RowFactory.create(row.get(0), row.get(1), someMethod(row.get(2)));
}), somedf.schema());
Dataset<Row> ds = spark.createDataFrame(Collections.singletonList(data), SellerAsinAttribute.class);
ds.map((i)-> {
Object arrayObj = Array.newInstance(Object.class, i.length());
for (int n = 0; n < i.length(); ++n) {
Array.set(arrayObj, n, i.get(n));//change 'i.get(n)' to anything you want, if you change type, remember to update schema
}
Method create = RowFactory.class.getMethod("create", Object[].class);
return (Row) create.invoke(null, arrayObj);
}, RowEncoder.apply(ds.schema())).show();