从非分区 table 创建分区配置单元 table

Question

我有一个 Hive table，它是通过连接来自多个 table 的数据创建的。此数据驻留在一个包含多个文件（“0001_1”、“0001_2”、……等等）的文件夹中。我需要根据此 table 中名为 pt_dt 的日期字段创建一个分区 table（通过更改此 table 或创建一个新字段）。有办法吗？

我试过创建一个新的 table 并插入其中（如下），但没有用

create external table table2 (acct_id bigint, eval_dt string)
partitioned by (pt_dt string);
insert into table2
partition (pt_dt) 
select acct_id, eval_dt, pt_dt
from jmx948_variable_summary;

这会引发错误

"FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 189 Cumulative CPU: 401.68 sec HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 6 minutes 41 seconds 680 msec"

Answer 1

经过一些尝试和错误后能够弄清楚。

在 Hive 中启用动态分区：

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

为分区 table 创建模式：

CREATE TABLE table1 (id STRING, info STRING)
PARTITIONED BY ( tdate STRING);

插入分区 table :

FROM table2 t2
INSERT OVERWRITE TABLE table1 PARTITION(tdate)
SELECT t2.id, t2.info, t2.tdate
DISTRIBUTE BY tdate;

Answer 2

在我正在使用的版本中（Hive 0.14.0.2.2.4.2-2）

INSERT INTO TABLE table1 PARTITION(tdate) SELECT t2.id, t2.info, t2.tdate

来自源table select 需要被最后分区的列，在上面的例子中，date被select编辑为[=28=中的最后一列].同样，如果需要 table 被列 "info" 分割，那么

INSERT INTO TABLE table1 PARTITION(info) SELECT t2.id, , t2.tdate, t2.info

如果要创建具有多个分区的 table，select 查询需要按照该顺序进行。如果你想把上面的 table 和 "date" 分开，然后 "info"

INSERT INTO TABLE table1 PARTITION(date, info) SELECT t2.id, , t2.tdate, t2.info

加上"info"，然后"date"

INSERT INTO TABLE table1 PARTITION(info, date) SELECT t2.id, , t2.info, t2.tdate

从非分区 table 创建分区配置单元 table

Creating a partitioned hive table from a non partitioned table

hive

partition