直线查询输出采用 JSON 格式而不是 csv table
Beeline query output coming in JSON format instead of csv table
我正在使用如下所示的直线查询,位于 HDFS 中的基础数据来自大型机服务器。我只想执行一个查询并将其转储到 csv(或任何表格格式):
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=false --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
我的问题是:
The format is not clean, there are extra rows at top and bottom ;
It appears as JSOn and not a table.
Some numbers seem hexadecimal format.
+-----------------------------------------------------------------------------------------------------------------------------+
| col1:{"col1_a":"00000" col1_b:"0" col1_c:{"col11_a":"00000" col11_tb:{"mo_acct_tp":"0" col11_c:"0"}} col1_d:"0"}|
+-----------------------------------------------------------------------------------------------------------------------------+
我想要一个常规的 csv,列名位于顶部且没有嵌套。
你必须做showHeader=true
,你会得到想要的结果
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=true --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
您也可以尝试 table 格式,outputformat=table
,这不会将 csv 作为输出,但会为您提供如下所示的干净表格结构:
+-----+---------+-----------------+
| id | value | comment |
+-----+---------+-----------------+
| 1 | Value1 | Test comment 1 |
| 2 | Value2 | Test comment 2 |
| 3 | Value3 | Test comment 3 |
+-----+---------+-----------------+
请帮助我们更好地理解您的数据。
当您在直线或蜂巢中 运行 select 查询时,您的 table 是否有如下数据?:
> select * from test;
+------------------------------------------------------------------------------------------------------------------------+--+
| test.col |
+------------------------------------------------------------------------------------------------------------------------+--+
| {"col1_a":"00000","col1_b":"0","col1_c":{"col11_a":"00000","col11_tb":{"mo_acct_tp":"0","col11_c":"0"}},"col1_d":"0"} |
+------------------------------------------------------------------------------------------------------------------------+--+
如果是,您可能需要从 Json 对象中解析出数据,如下所示:
select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from test tbl
INFO : Completed executing command(queryId=hive_20180918182457_a2d6230d-28bc-4839-a1b5-0ac63c7779a5); Time taken: 1.007 seconds
INFO : OK
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| col1_a | col1_b | col1_c_col11_a | col1_c_col11_tb_col11_c | col1_c_col11_tb_mo_acct_tp | col1_d |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| 00000 | 0 | 00000 | 0 | 0 | 0 |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
1 row selected (2.058 seconds)
然后您可以在命令行中使用此查询将结果导出到文件中。
>beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' --showHeader=false --outputformat=csv2 -e "select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from corpde_commops.test tbl;" > test.csv
如果您需要文件中的列名,请打开 --showHeader=true
最终输出为:
>cat test.csv
col1_a,col1_b,col1_c_col11_a,col1_c_col11_tb_col11_c,col1_c_col11_tb_mo_acct_tp,col1_d
00000,0,00000,0,0,0
我显然没有看出你的直线语句有任何错误。
如果您的数据与上例不同,解决方案可能会有所不同..
祝一切顺利。
我正在使用如下所示的直线查询,位于 HDFS 中的基础数据来自大型机服务器。我只想执行一个查询并将其转储到 csv(或任何表格格式):
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=false --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
我的问题是:
The format is not clean, there are extra rows at top and bottom ;
It appears as JSOn and not a table.
Some numbers seem hexadecimal format.
+-----------------------------------------------------------------------------------------------------------------------------+
| col1:{"col1_a":"00000" col1_b:"0" col1_c:{"col11_a":"00000" col11_tb:{"mo_acct_tp":"0" col11_c:"0"}} col1_d:"0"}|
+-----------------------------------------------------------------------------------------------------------------------------+
我想要一个常规的 csv,列名位于顶部且没有嵌套。
你必须做showHeader=true
,你会得到想要的结果
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=true --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
您也可以尝试 table 格式,outputformat=table
,这不会将 csv 作为输出,但会为您提供如下所示的干净表格结构:
+-----+---------+-----------------+
| id | value | comment |
+-----+---------+-----------------+
| 1 | Value1 | Test comment 1 |
| 2 | Value2 | Test comment 2 |
| 3 | Value3 | Test comment 3 |
+-----+---------+-----------------+
请帮助我们更好地理解您的数据。
当您在直线或蜂巢中 运行 select 查询时,您的 table 是否有如下数据?:
> select * from test;
+------------------------------------------------------------------------------------------------------------------------+--+
| test.col |
+------------------------------------------------------------------------------------------------------------------------+--+
| {"col1_a":"00000","col1_b":"0","col1_c":{"col11_a":"00000","col11_tb":{"mo_acct_tp":"0","col11_c":"0"}},"col1_d":"0"} |
+------------------------------------------------------------------------------------------------------------------------+--+
如果是,您可能需要从 Json 对象中解析出数据,如下所示:
select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from test tbl
INFO : Completed executing command(queryId=hive_20180918182457_a2d6230d-28bc-4839-a1b5-0ac63c7779a5); Time taken: 1.007 seconds
INFO : OK
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| col1_a | col1_b | col1_c_col11_a | col1_c_col11_tb_col11_c | col1_c_col11_tb_mo_acct_tp | col1_d |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| 00000 | 0 | 00000 | 0 | 0 | 0 |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
1 row selected (2.058 seconds)
然后您可以在命令行中使用此查询将结果导出到文件中。
>beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' --showHeader=false --outputformat=csv2 -e "select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from corpde_commops.test tbl;" > test.csv
如果您需要文件中的列名,请打开 --showHeader=true
最终输出为:
>cat test.csv
col1_a,col1_b,col1_c_col11_a,col1_c_col11_tb_col11_c,col1_c_col11_tb_mo_acct_tp,col1_d
00000,0,00000,0,0,0
我显然没有看出你的直线语句有任何错误。
如果您的数据与上例不同,解决方案可能会有所不同..
祝一切顺利。