如何从 Java 中的 Parquet 文件中读取特定列
How to read specific columns from a Parquet file in Java
我正在使用知道如何将自定义对象 'T' 写入 Parquet 的 WriteSupport。我只想读取写入 Parquet 文件的自定义对象的 100 列中的 2 或 3 个 特定列 。
大多数在线示例扩展 ReadSupport 并读取整个记录。想在不使用 Spark、Hive、Avro、Thrift 等的情况下实现此
Java 中的示例,它读取 Parquet 中自定义对象的选定列?
这 post 可能会有所帮助。
Read specific column from Parquet without using Spark
If you just want to read specific columns, then you need to set a read schema on the configuration that the ParquetReader builder accepts. (This is also known as a projection).
In your case you should be able to call .withConf(conf) on the AvroParquetReader builder class, and in the conf you pass in, invoke conf.set(ReadSupport.PARQUET_READ_SCHEMA, schema) where schema is a avro schema in String form.
我正在使用知道如何将自定义对象 'T' 写入 Parquet 的 WriteSupport。我只想读取写入 Parquet 文件的自定义对象的 100 列中的 2 或 3 个 特定列 。
大多数在线示例扩展 ReadSupport 并读取整个记录。想在不使用 Spark、Hive、Avro、Thrift 等的情况下实现此
Java 中的示例,它读取 Parquet 中自定义对象的选定列?
这 post 可能会有所帮助。
Read specific column from Parquet without using Spark
If you just want to read specific columns, then you need to set a read schema on the configuration that the ParquetReader builder accepts. (This is also known as a projection).
In your case you should be able to call .withConf(conf) on the AvroParquetReader builder class, and in the conf you pass in, invoke conf.set(ReadSupport.PARQUET_READ_SCHEMA, schema) where schema is a avro schema in String form.