无法使用配置单元查询结构字段(CDH 5.9.0)
Can not query struct field with hive (CDH 5.9.0)
我刚切换到 CDH 5.9.0(全新安装,不是升级,在新集群上)。
我有一个像这样的 table(有点复杂,但我也用这个例子重现):
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>)
PARTITIONED BY (`IMPORT_DATE' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://myhost.com:8020/user/hive/warehouse/dbp/products'
TBLPROPERTIES ('transient_lastDdlTime'='1482160314')
如果我这样做:
SELECT header FROM products;
==>查询成功,return所有商品headers(格式为JSON)
但如果我这样做:
SELECT header.PCODE FROM products;
==> 它失败并显示以下堆栈跟踪:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
... 22 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:980)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:63)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
... 22 more
在我的旧集群 (CDH 5.8.2) 上,它运行良好。
有什么想法吗?
[编辑:我已将所有 CDH 5.9.0 jar (/opt/cloudera/parcels/CDH/jars) 降级为 CDH 5.8.2 并且查询成功。 CDH 5.9.0 中可能存在回归...]
[编辑 2:如果 table 存储为文本文件('org.apache.hadoop.mapred.TextInputFormat'),则查询运行成功。
我们可以认为问题与parquet有关。]
[也发布在 Cloudera 论坛上:https://community.cloudera.com/t5/Batch-SQL-Apache-Hive/Can-not-query-struct-field-with-hive-CDH-5-9-0/m-p/48672#U48672]
我将通过降低查询元素的大小写来解决这个问题。 P.ex:
SELECT header.pcode FROM products;
所以我尝试了很多东西,结果如下:
-- Struct fieldnames in lowercase
CREATE TABLE `products`(`header` struct<pcode:string, pname:string>) STORED AS PARQUET;
Select 结果:
SELECT header.pcode FROM products
==> 确定
SELECT HEADER.pcode FROM products
==> 确定
SELECT header.PCODE FROM products
==> KO
SELECT HEADER.PCODE FROM products
==> KO
-- Struct fieldnames in UPPERCASE
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>) STORED AS PARQUET;
Select 结果:
SELECT header.pcode FROM products
==> KO
SELECT HEADER.pcode FROM products
==> KO
SELECT header.PCODE FROM products
==> KO
SELECT HEADER.PCODE FROM products
==> KO
==> 在 CDH 5.9.0 中将表存储为 PARQUET(它在 CDH 5.8.2 中有效)避免结构字段名中的大写...
我刚切换到 CDH 5.9.0(全新安装,不是升级,在新集群上)。 我有一个像这样的 table(有点复杂,但我也用这个例子重现):
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>)
PARTITIONED BY (`IMPORT_DATE' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://myhost.com:8020/user/hive/warehouse/dbp/products'
TBLPROPERTIES ('transient_lastDdlTime'='1482160314')
如果我这样做:
SELECT header FROM products;
==>查询成功,return所有商品headers(格式为JSON)
但如果我这样做:
SELECT header.PCODE FROM products;
==> 它失败并显示以下堆栈跟踪:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
... 22 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:980)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:63)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
... 22 more
在我的旧集群 (CDH 5.8.2) 上,它运行良好。 有什么想法吗?
[编辑:我已将所有 CDH 5.9.0 jar (/opt/cloudera/parcels/CDH/jars) 降级为 CDH 5.8.2 并且查询成功。 CDH 5.9.0 中可能存在回归...]
[编辑 2:如果 table 存储为文本文件('org.apache.hadoop.mapred.TextInputFormat'),则查询运行成功。 我们可以认为问题与parquet有关。]
[也发布在 Cloudera 论坛上:https://community.cloudera.com/t5/Batch-SQL-Apache-Hive/Can-not-query-struct-field-with-hive-CDH-5-9-0/m-p/48672#U48672]
我将通过降低查询元素的大小写来解决这个问题。 P.ex:
SELECT header.pcode FROM products;
所以我尝试了很多东西,结果如下:
-- Struct fieldnames in lowercase
CREATE TABLE `products`(`header` struct<pcode:string, pname:string>) STORED AS PARQUET;
Select 结果:
SELECT header.pcode FROM products
==> 确定SELECT HEADER.pcode FROM products
==> 确定SELECT header.PCODE FROM products
==> KOSELECT HEADER.PCODE FROM products
==> KO
-- Struct fieldnames in UPPERCASE
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>) STORED AS PARQUET;
Select 结果:
SELECT header.pcode FROM products
==> KOSELECT HEADER.pcode FROM products
==> KOSELECT header.PCODE FROM products
==> KOSELECT HEADER.PCODE FROM products
==> KO
==> 在 CDH 5.9.0 中将表存储为 PARQUET(它在 CDH 5.8.2 中有效)避免结构字段名中的大写...