无法从数据帧中提取 array/list,AnalysisException:需要结构类型但得到二进制

Unable to extract array/list from dataframe, AnalysisException : need struct type but got binary

我有一个包含 String[] 的数据集,我正在努力从中提取列。这是代码

import static org.apache.spark.sql.functions.col;

//Read parquet data
Dataset<Row> readerDF = spark.readStream().format("parquet").

List<String> columns = Arrays.asList("city","country");
//Interested in only field in data for now 'fieldMap' which is Map<String,String>

Dataset<String[]> stringArrDF = readerDF.map((MapFunction<Row, String[]>) row -> {                
    Map<String,String> fields = row.getJavaMap(row.fieldIndex("fieldMap"));
    List<String> columnList = new ArrayList<>();                
    columns.forEach(columnName ->
    {
        columnList.add(fields.getOrDefault(columnName, ""));
    });
    return columnList.toArray(new String[columns.size]);
}, Encoders.kryo(String[].class));

//I was expecting to extract city here:
Dataset ds = stringArrDF.select(col("value").getItem(1).as("city"));

但失败并出现以下异常。

Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from value#22;

如何从数据集中访问字符串[] 或列表字段?

您遇到错误。

Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from value#22: need struct type but got binary;

您正在使用 Encoders.kryo(String[].class) 创建 stringArrDF。如果您查看 Encoders.kryo 的文档,它会显示

Creates an encoder that serializes objects of type T using Kryo. This encoder maps T into a single byte array (binary) field.

使用 spark.implicits().newStringArrayEncoder() 编码您的 String[]。