使用 Hive 向 Hbase 中插入数据(JSON 文件)
Insert data into Hbase using Hive (JSON file)
我已经使用 hive 在 hbase 中创建了一个 table:
hive> CREATE TABLE hbase_table_emp(id int, name string, role string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");
并创建了另一个 table 来加载数据:
hive> create table testemp(id int, name string, role string) row format delimited fields terminated by '\t';
hive> load data local inpath '/home/user/sample.txt' into table testemp;
最后插入数据到hbasetable:
hive> insert overwrite table hbase_table_emp select * from testemp;
hive> select * from hbase_table_emp;
OK
123 Ram TeamLead
456 Silva Member
789 Krishna Member
time taken: 0.160 seconds, Fetched: 3 row(s)
table 在 hbase 中看起来像这样:
hbase(main):002:0> scan 'emp'
ROW COLUMN+CELL
123 column=cf1:name, timestamp=1422540225254, value=Ram
123 column=cf1:role, timestamp=1422540225254, value=TeamLead
456 column=cf1:name, timestamp=1422540225254, value=Silva
456 column=cf1:role, timestamp=1422540225254, value=Member
789 column=cf1:name, timestamp=1422540225254, value=Krishna
789 column=cf1:role, timestamp=1422540225254, value=Member
3 row(s) in 2.1230 seconds
我可以对 JSON 文件做同样的事情吗:
{"id": 123, "name": "Ram", "role":"TeamLead"}
{"id": 456, "name": "Silva", "role":"Member"}
{"id": 789, "name": "Krishna", "role":"Member"}
然后做:
hive> load data local inpath '/home/user/sample.json' into table testemp;
请帮忙! :)
您可以使用 get_json_object
函数将数据解析为 JSON 对象。例如,如果您使用 JSON 数据创建暂存 table:
DROP TABLE IF EXISTS staging;
CREATE TABLE staging (json STRING);
LOAD DATA LOCAL INPATH '/local/path/to/jsonfile' INTO TABLE staging;
然后用get_json_object
提取你要加载的属性到table:
INSERT OVERWRITE TABLE hbase_table_emp SELECT
get_json_object(json, "$.id") AS id,
get_json_object(json, "$.name") AS name,
get_json_object(json, "$.role") AS role
FROM staging;
这个函数有更全面的讨论here。
我已经使用 hive 在 hbase 中创建了一个 table:
hive> CREATE TABLE hbase_table_emp(id int, name string, role string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");
并创建了另一个 table 来加载数据:
hive> create table testemp(id int, name string, role string) row format delimited fields terminated by '\t';
hive> load data local inpath '/home/user/sample.txt' into table testemp;
最后插入数据到hbasetable:
hive> insert overwrite table hbase_table_emp select * from testemp;
hive> select * from hbase_table_emp;
OK
123 Ram TeamLead
456 Silva Member
789 Krishna Member
time taken: 0.160 seconds, Fetched: 3 row(s)
table 在 hbase 中看起来像这样:
hbase(main):002:0> scan 'emp'
ROW COLUMN+CELL
123 column=cf1:name, timestamp=1422540225254, value=Ram
123 column=cf1:role, timestamp=1422540225254, value=TeamLead
456 column=cf1:name, timestamp=1422540225254, value=Silva
456 column=cf1:role, timestamp=1422540225254, value=Member
789 column=cf1:name, timestamp=1422540225254, value=Krishna
789 column=cf1:role, timestamp=1422540225254, value=Member
3 row(s) in 2.1230 seconds
我可以对 JSON 文件做同样的事情吗:
{"id": 123, "name": "Ram", "role":"TeamLead"}
{"id": 456, "name": "Silva", "role":"Member"}
{"id": 789, "name": "Krishna", "role":"Member"}
然后做:
hive> load data local inpath '/home/user/sample.json' into table testemp;
请帮忙! :)
您可以使用 get_json_object
函数将数据解析为 JSON 对象。例如,如果您使用 JSON 数据创建暂存 table:
DROP TABLE IF EXISTS staging;
CREATE TABLE staging (json STRING);
LOAD DATA LOCAL INPATH '/local/path/to/jsonfile' INTO TABLE staging;
然后用get_json_object
提取你要加载的属性到table:
INSERT OVERWRITE TABLE hbase_table_emp SELECT
get_json_object(json, "$.id") AS id,
get_json_object(json, "$.name") AS name,
get_json_object(json, "$.role") AS role
FROM staging;
这个函数有更全面的讨论here。