Remove/Mapping 在 Hive 上重复键 table?
Remove/Mapping duplicates key on Hive table?
我有 JSON 个文件要加载到配置单元 table,但它包含使所有数据为空或无法 select 在配置单元上查询的重复键。
那些 JSON 文件有这样的内容:
{"timeSeries":"17051233123","id":"123","timeseries":"17051233123","name":"sample"}
我尝试创建配置单元 table
CREATE EXTERNAL TABLE table_hive (`id`
STRING, `name` STRING, `timeseries` STRING,`timeseries2` STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "mapping.timeseries2" = "timeSeries")
LOCATION 'app/jsonfile.json';
如何让它成为可查询的配置单元 table ?
与 Hive 发行版附带的 JSON SerDe 一起工作正常
create external table table_hive
(
id string
,name string
,timeseries string
)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from table_hive
;
+-----+--------+-------------+
| id | name | timeseries |
+-----+--------+-------------+
| 123 | sample | 17051233123 |
+-----+--------+-------------+
我有 JSON 个文件要加载到配置单元 table,但它包含使所有数据为空或无法 select 在配置单元上查询的重复键。
那些 JSON 文件有这样的内容:
{"timeSeries":"17051233123","id":"123","timeseries":"17051233123","name":"sample"}
我尝试创建配置单元 table
CREATE EXTERNAL TABLE table_hive (`id`
STRING, `name` STRING, `timeseries` STRING,`timeseries2` STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "mapping.timeseries2" = "timeSeries")
LOCATION 'app/jsonfile.json';
如何让它成为可查询的配置单元 table ?
与 Hive 发行版附带的 JSON SerDe 一起工作正常
create external table table_hive
(
id string
,name string
,timeseries string
)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from table_hive
;
+-----+--------+-------------+
| id | name | timeseries |
+-----+--------+-------------+
| 123 | sample | 17051233123 |
+-----+--------+-------------+