有没有办法保留 Spark 数据集中变量的顺序?
Is there a way to retain the order of variables in a Spark Dataset?
我正在创建一个 Spark 数据集
Dataset<myBeanClass> myDataset = myDataFrame.as(Encoders.bean(myBeanClass.class));
此时,它的架构看起来像,
root
|-- name: string (nullable = true)
|-- age: string (nullable = true)
|-- gender: string (nullable = true)
执行地图转换后,
Dataset<myBeanClass> resultDataset = myDataset.map(new MapFunction<myBeanClass,myBeanClass>() {
@Override
public myBeanClass call(myBeanClass v1) throws Exception {
// some code
return v1;
}
}, Encoders.bean(myBeanClass.class));
架构变为
root
|-- age: string (nullable = true)
|-- gender: string (nullable = true)
|-- name: string (nullable = true)
在 this 示例中也注意到了相同的行为。有办法保留订单吗?
我想不出一种方法来阻止模式中变量顺序的改变。但是我能够将它转换回我想要的任何顺序。这是我的做法,
DataFrame resultsDataFrame = myDataset.toDF().selectExpr(myDataFrame.schema().fieldNames());
resultsDataFrame 的架构与我从中创建数据集的 DataFrame 的架构相同
root
|-- name: string (nullable = true)
|-- age: string (nullable = true)
|-- gender: string (nullable = true)
我正在创建一个 Spark 数据集
Dataset<myBeanClass> myDataset = myDataFrame.as(Encoders.bean(myBeanClass.class));
此时,它的架构看起来像,
root
|-- name: string (nullable = true)
|-- age: string (nullable = true)
|-- gender: string (nullable = true)
执行地图转换后,
Dataset<myBeanClass> resultDataset = myDataset.map(new MapFunction<myBeanClass,myBeanClass>() {
@Override
public myBeanClass call(myBeanClass v1) throws Exception {
// some code
return v1;
}
}, Encoders.bean(myBeanClass.class));
架构变为
root
|-- age: string (nullable = true)
|-- gender: string (nullable = true)
|-- name: string (nullable = true)
在 this 示例中也注意到了相同的行为。有办法保留订单吗?
我想不出一种方法来阻止模式中变量顺序的改变。但是我能够将它转换回我想要的任何顺序。这是我的做法,
DataFrame resultsDataFrame = myDataset.toDF().selectExpr(myDataFrame.schema().fieldNames());
resultsDataFrame 的架构与我从中创建数据集的 DataFrame 的架构相同
root
|-- name: string (nullable = true)
|-- age: string (nullable = true)
|-- gender: string (nullable = true)