配置单元查询未返回任何数据
hive query is returning no data
CREATE EXTERNAL TABLE invoiceitems (
InvoiceNo INT,
StockCode INT,
Description STRING,
Quantity INT,
InvoiceDate BIGINT,
UnitPrice DOUBLE,
CustomerID INT,
Country STRING,
LineNo INT,
InvoiceTime STRING,
StoreID INT,
TransactionID STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3a://streamingdata/data/*';
数据文件是由 spark 结构化流作业创建的:
...
data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json 7.1 KB 29/08/2018 10:27:32 PM
data/part-00000-0075634b-8513-47b3-b5f8-19df8269cf9d-c000.json 1.3 KB 30/08/2018 10:47:32 AM
data/part-00000-00b6b230-8bb3-49d1-a42e-ad768c1f9a94-c000.json 2.3 KB 30/08/2018 1:25:02 AM
...
这是第一个文件的前几行:
{"InvoiceNo":5421462,"StockCode":22426,"Description":"ENAMEL WASH BOWL CREAM","Quantity":8,"InvoiceDate":1535578020000,"UnitPrice":3.75,"CustomerID":13405,"Country":"United Kingdom","LineNo":6,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"542146260180829"}
{"InvoiceNo":5501932,"StockCode":22170,"Description":"PICTURE FRAME WOOD TRIPLE PORTRAIT","Quantity":4,"InvoiceDate":1535578020000,"UnitPrice":6.75,"CustomerID":13952,"Country":"United Kingdom","LineNo":26,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"5501932260180829"}
但是,如果我运行查询,没有返回数据:
hive> select * from invoiceitems limit 5;
OK
Time taken: 24.127 seconds
配置单元的日志文件为空:
$ ls /var/log/hive*
/var/log/hive:
/var/log/hive-hcatalog:
/var/log/hive2:
如何进一步调试?
当我 运行:
时,我收到了更多关于错误的提示
select count(*) from invoiceitems;
这返回了以下错误
...
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1
killedVertices:1 FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1535521291031_0011_1_00,
diagnostics=[Vertex vertex_1535521291031_0011_1_00 [Map 1]
killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input:
invoiceitems initializer failed, vertex=vertex_1535521291031_0011_1_00
[Map 1], java.io.IOException: cannot find dir =
s3a://streamingdata/data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json
in pathToPartitionInfo: [s3a://streamingdata/data/*]
我决定更改创建 table 的定义:
LOCATION 's3a://streamingdata/data/*';
到
LOCATION 's3a://streamingdata/data/';
这解决了问题。
CREATE EXTERNAL TABLE invoiceitems (
InvoiceNo INT,
StockCode INT,
Description STRING,
Quantity INT,
InvoiceDate BIGINT,
UnitPrice DOUBLE,
CustomerID INT,
Country STRING,
LineNo INT,
InvoiceTime STRING,
StoreID INT,
TransactionID STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3a://streamingdata/data/*';
数据文件是由 spark 结构化流作业创建的:
...
data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json 7.1 KB 29/08/2018 10:27:32 PM
data/part-00000-0075634b-8513-47b3-b5f8-19df8269cf9d-c000.json 1.3 KB 30/08/2018 10:47:32 AM
data/part-00000-00b6b230-8bb3-49d1-a42e-ad768c1f9a94-c000.json 2.3 KB 30/08/2018 1:25:02 AM
...
这是第一个文件的前几行:
{"InvoiceNo":5421462,"StockCode":22426,"Description":"ENAMEL WASH BOWL CREAM","Quantity":8,"InvoiceDate":1535578020000,"UnitPrice":3.75,"CustomerID":13405,"Country":"United Kingdom","LineNo":6,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"542146260180829"}
{"InvoiceNo":5501932,"StockCode":22170,"Description":"PICTURE FRAME WOOD TRIPLE PORTRAIT","Quantity":4,"InvoiceDate":1535578020000,"UnitPrice":6.75,"CustomerID":13952,"Country":"United Kingdom","LineNo":26,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"5501932260180829"}
但是,如果我运行查询,没有返回数据:
hive> select * from invoiceitems limit 5;
OK
Time taken: 24.127 seconds
配置单元的日志文件为空:
$ ls /var/log/hive*
/var/log/hive:
/var/log/hive-hcatalog:
/var/log/hive2:
如何进一步调试?
当我 运行:
时,我收到了更多关于错误的提示select count(*) from invoiceitems;
这返回了以下错误
...
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1535521291031_0011_1_00, diagnostics=[Vertex vertex_1535521291031_0011_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: invoiceitems initializer failed, vertex=vertex_1535521291031_0011_1_00 [Map 1], java.io.IOException: cannot find dir = s3a://streamingdata/data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json in pathToPartitionInfo: [s3a://streamingdata/data/*]
我决定更改创建 table 的定义:
LOCATION 's3a://streamingdata/data/*';
到
LOCATION 's3a://streamingdata/data/';
这解决了问题。