无法从数据帧中提取 array/list,AnalysisException:需要结构类型但得到二进制
Unable to extract array/list from dataframe, AnalysisException : need struct type but got binary
我有一个包含 String[] 的数据集,我正在努力从中提取列。这是代码
import static org.apache.spark.sql.functions.col;
//Read parquet data
Dataset<Row> readerDF = spark.readStream().format("parquet").
List<String> columns = Arrays.asList("city","country");
//Interested in only field in data for now 'fieldMap' which is Map<String,String>
Dataset<String[]> stringArrDF = readerDF.map((MapFunction<Row, String[]>) row -> {
Map<String,String> fields = row.getJavaMap(row.fieldIndex("fieldMap"));
List<String> columnList = new ArrayList<>();
columns.forEach(columnName ->
{
columnList.add(fields.getOrDefault(columnName, ""));
});
return columnList.toArray(new String[columns.size]);
}, Encoders.kryo(String[].class));
//I was expecting to extract city here:
Dataset ds = stringArrDF.select(col("value").getItem(1).as("city"));
但失败并出现以下异常。
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Can't extract value from value#22;
如何从数据集中访问字符串[] 或列表字段?
您遇到错误。
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Can't extract value from value#22: need struct type but got binary;
您正在使用 Encoders.kryo(String[].class)
创建 stringArrDF
。如果您查看 Encoders.kryo
的文档,它会显示
Creates an encoder that serializes objects of type T using Kryo. This
encoder maps T into a single byte array (binary) field.
使用 spark.implicits().newStringArrayEncoder()
编码您的 String[]。
我有一个包含 String[] 的数据集,我正在努力从中提取列。这是代码
import static org.apache.spark.sql.functions.col;
//Read parquet data
Dataset<Row> readerDF = spark.readStream().format("parquet").
List<String> columns = Arrays.asList("city","country");
//Interested in only field in data for now 'fieldMap' which is Map<String,String>
Dataset<String[]> stringArrDF = readerDF.map((MapFunction<Row, String[]>) row -> {
Map<String,String> fields = row.getJavaMap(row.fieldIndex("fieldMap"));
List<String> columnList = new ArrayList<>();
columns.forEach(columnName ->
{
columnList.add(fields.getOrDefault(columnName, ""));
});
return columnList.toArray(new String[columns.size]);
}, Encoders.kryo(String[].class));
//I was expecting to extract city here:
Dataset ds = stringArrDF.select(col("value").getItem(1).as("city"));
但失败并出现以下异常。
Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from value#22;
如何从数据集中访问字符串[] 或列表字段?
您遇到错误。
Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from value#22: need struct type but got binary;
您正在使用 Encoders.kryo(String[].class)
创建 stringArrDF
。如果您查看 Encoders.kryo
的文档,它会显示
Creates an encoder that serializes objects of type T using Kryo. This encoder maps T into a single byte array (binary) field.
使用 spark.implicits().newStringArrayEncoder()
编码您的 String[]。