Hive 查询不读取分区字段

Question

我使用以下查询

创建了分区 Hive table

CREATE EXTERNAL TABLE `customer`(            
   `cid` string COMMENT '',              
   `member` string COMMENT '',           
   `account` string COMMENT '')
   PARTITIONED BY (update_period string)
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'   
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
 LOCATION                                           
   'hdfs://nameservice1/user/customer'          
 TBLPROPERTIES (                                    
   'avro.schema.url'='/user/schema/Customer.avsc')

我正在使用 map reduce 程序写入分区位置。当我使用 avro 工具读取输出文件时，它以 json 格式显示正确的数据。但是当我使用配置单元查询显示数据时，什么也没有显示。如果我在 table 创建期间不使用分区字段，那么这些值将显示在配置单元中。这可能是什么原因？我将 mapreduce 程序的输出位置指定为“/user/customer/update_period=201811”。

我是否需要在 mapreduce 程序配置中添加任何内容来解决这个问题？

Answer 1

在 HDFS 位置加载新分区后，您需要运行 msck repair table。

Why we need to run msck Repair table statement everytime after each ingestion?

Hive 在其 Metastore 中存储每个 table 的分区列表。但是 新分区直接添加到 HDFS ，除非用户运行s 以下任一方法添加新添加的分区。

1.Adding each partition to the table

hive> alter table <db_name>.<table_name> add partition(`date`='<date_value>')
 location '<hdfs_location_of the specific partition>';

(或)

2.Run metastore check with repair table option

hive> Msck repair table <db_name>.<table_name>;

这会将有关分区的元数据添加到 Hive 元存储中，以用于尚不存在此类元数据的分区。换句话说，它会将存在于 HDFS 但不在 Metastore 中的任何分区添加到 Metastore。

Hive 查询不读取分区字段

Hive query not reading partition field

hadoop

hive

mapreduce

avro

hadoop-partitioning