分区 table 上的配置单元添加列不起作用
hive add columns on partitioned table does not work
我分享我在分区配置单元上添加列的经验 table。
如您所见,尽管有 CASCADE 功能,但 ALTER 会阻止我的 table :(
在分区 table
上添加列
table 描述
CREATE TABLE test (
a string,
b string,
c string
)
PARTITIONED BY (
x string,
y string,
z string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
'orc.compress'='SNAPPY'
);
复制 table
CREATE TABLE test_tmp...
hadoop distcp hdfs://.../test/* dfs://.../test_tmp
MSCK REPAIR TABLE test_tmp;
SELECT * FROM test_tmp
LIMIT 100
check : OK (I get results)
修改table
ALTER TABLE test_tmp
ADD COLUMNS(
aa timestamp,
bb string,
cc int,
dd string
) CASCADE;
SELECT * FROM test_tmp
LIMIT 100
...
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:19, Vertex vertex_1502459312997_187854_4_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
... 1 statement(s) executed, 0 rows affected, exec/fetch time: 21.655/0.000 sec [0 successful, 1 errors]
check : KO (I get this error)
如果您正在使用 Hive 0.x 或 1.x 那么您可能是...
的受害者
HIVE-10598 Vectorization borks when column is added to table.
...这是特定于 ORC 格式的,即使它在 JIRA 标签中并不明显。
从 Hive 2.0 开始有部分修复 (即 ADD
已修复,但 DROP
/ RENAME
/ CHANGE
仍然残废)感谢
HIVE-11981 ORC Schema Evolution Issues (Vectorized, ACID, and
Non-Vectorized)
Hive 2.1.1 的另一个相关修复 CHANGE
HIVE-14355 Schema evolution for ORC in llap is broken
for Int to String conversion
待续...
我分享我在分区配置单元上添加列的经验 table。 如您所见,尽管有 CASCADE 功能,但 ALTER 会阻止我的 table :(
在分区 table
上添加列table 描述
CREATE TABLE test (
a string,
b string,
c string
)
PARTITIONED BY (
x string,
y string,
z string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
'orc.compress'='SNAPPY'
);
复制 table
CREATE TABLE test_tmp...
hadoop distcp hdfs://.../test/* dfs://.../test_tmp
MSCK REPAIR TABLE test_tmp;
SELECT * FROM test_tmp
LIMIT 100
check : OK (I get results)
修改table
ALTER TABLE test_tmp
ADD COLUMNS(
aa timestamp,
bb string,
cc int,
dd string
) CASCADE;
SELECT * FROM test_tmp
LIMIT 100
...
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:19, Vertex vertex_1502459312997_187854_4_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
... 1 statement(s) executed, 0 rows affected, exec/fetch time: 21.655/0.000 sec [0 successful, 1 errors]
check : KO (I get this error)
如果您正在使用 Hive 0.x 或 1.x 那么您可能是...
的受害者HIVE-10598 Vectorization borks when column is added to table.
...这是特定于 ORC 格式的,即使它在 JIRA 标签中并不明显。
从 Hive 2.0 开始有部分修复 (即 ADD
已修复,但 DROP
/ RENAME
/ CHANGE
仍然残废)感谢
HIVE-11981 ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
Hive 2.1.1 的另一个相关修复 CHANGE
HIVE-14355 Schema evolution for ORC in llap is broken for Int to String conversion
待续...