Hive 创建 Table 错误

Question

我正在尝试使用具有以下结构的 JSONSerde 创建 Hive table：

CREATE TABLE events (
device_uuid string,
uuid string,
custom struct<
    "Vendor ID":int,
    "Customer ID":int>,
platform string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;

我在尝试创建 table 时遇到的问题如下：

Error occurred executing hive query: OK converting to local hdfs://dpcl01:820/user/hive/aux_jars/json-serde-1.3-jar-with-dependencies.jar Added /tmp/523576-5d62-4fff-b737-813aca807eee_resources/json-serde-1.3-jar-with-dependencies.jar to class path Added resource: /tmp/52356576-5d62-4fff-b737-813aca807eee_resources/json-serde-1.3-jar-with-dependencies.jar FAILED: ParseException line 8:2 cannot recognize input near '"Vendor ID"' ':' 'int' in column specification

很明显，这个错误是由于列名中的 space 引起的，但原始数据以这种形式出现，我不想执行预处理步骤来删除 spaces。

有什么建议吗？

Answer 1

如果您不想维持预处理阶段（我假设在暂存 table 中加载原始文本然后对其进行转换算作您上下文中的预处理），对我来说，最直接的选择是扩展 SerDe 以在反序列化期间将空格替换为类似下划线的内容，以匹配您的 Hive column/struct 字段名称定义。从我可以看到的源代码（我假设你正在使用 this SerDe），JSONObject 对象 class 的 put 方法可以被覆盖，以便所有实例key 参数中的空白在插入基础映射对象之前被转换。

如果您愿意接受使用分期table的方法，您总是可以加载原始文本并使用get_json_object提取内容您需要，因为 Hive 的 JSON 路径中的空格非常好。例如：

get_json_object(raw_text, "$.Vendor ID")

Hive 创建 Table 错误

Hive Create Table error

hadoop

hive