在现有 HBase table 之上定义 Hive 外部 table
Defining Hive external table on top of HBase existing table
有一个空的 HBase table 有两个列族:
create 'emp', 'personal_data', 'professional_data'
现在我正在尝试将 Hive 外部 table 映射到它,它自然会有一些列:
CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":id,
personal_data:city,
personal_data:name,
professional_data:occupation,
professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");
现在我得到的错误是这样的:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 5 elements while
hbase.columns.mapping has 6 elements (counting the key if implicit))
你能帮帮我吗?我做错了什么吗?
在您的映射中,您引用了 id
字段,但您应该引用 HBase key
关键字。如 documentation 中所述:
a mapping entry must be either :key or of the form
column-family-name:[column-name][#(binary|string)
只需将 :id
替换为 :key
即可:
CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
personal_data:city,
personal_data:name,
professional_data:occupation,
professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");
列映射基于列的顺序,而不是它们的名称。在文档中,段落 Multiple Columns and Families 您可以清楚地看到名称无关紧要
CREATE TABLE hbase_table_1(key int, value1 string, value2 int, value3 int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,a:b,a:c,d:e"
)
那么映射就是
- key -> id
- a:b -> 值 1
- a:c -> 值 2
- d:e -> 值 3
有一个空的 HBase table 有两个列族:
create 'emp', 'personal_data', 'professional_data'
现在我正在尝试将 Hive 外部 table 映射到它,它自然会有一些列:
CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":id,
personal_data:city,
personal_data:name,
professional_data:occupation,
professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");
现在我得到的错误是这样的:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 5 elements while hbase.columns.mapping has 6 elements (counting the key if implicit))
你能帮帮我吗?我做错了什么吗?
在您的映射中,您引用了 id
字段,但您应该引用 HBase key
关键字。如 documentation 中所述:
a mapping entry must be either :key or of the form column-family-name:[column-name][#(binary|string)
只需将 :id
替换为 :key
即可:
CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
personal_data:city,
personal_data:name,
professional_data:occupation,
professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");
列映射基于列的顺序,而不是它们的名称。在文档中,段落 Multiple Columns and Families 您可以清楚地看到名称无关紧要
CREATE TABLE hbase_table_1(key int, value1 string, value2 int, value3 int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,a:b,a:c,d:e"
)
那么映射就是
- key -> id
- a:b -> 值 1
- a:c -> 值 2
- d:e -> 值 3