如何从 Java 中的 Parquet 文件中读取特定列

How to read specific columns from a Parquet file in Java

apache
parquet
columnstore

我正在使用知道如何将自定义对象 'T' 写入 Parquet 的 WriteSupport。我只想读取写入 Parquet 文件的自定义对象的 100 列中的 2 或 3 个 特定列 。

大多数在线示例扩展 ReadSupport 并读取整个记录。想在不使用 Spark、Hive、Avro、Thrift 等的情况下实现此

Java 中的示例，它读取 Parquet 中自定义对象的选定列?

这 post 可能会有所帮助。

Read specific column from Parquet without using Spark

If you just want to read specific columns, then you need to set a read schema on the configuration that the ParquetReader builder accepts. (This is also known as a projection).

In your case you should be able to call .withConf(conf) on the AvroParquetReader builder class, and in the conf you pass in, invoke conf.set(ReadSupport.PARQUET_READ_SCHEMA, schema) where schema is a avro schema in String form.

如何从 Java 中的 Parquet 文件中读取特定列

How to read specific columns from a Parquet file in Java

apache

parquet

columnstore