在 Hive returns 错误中使用 Regex Serde 创建 table
Creating table with Regex Serde in Hive returns error
我在 Hive 中使用 Regex Serde 创建了一个 table。在 Hue 中,return 成功创建了 table。但是,当我尝试 return table SELECT * FROM pricefile_edited
或在 hue 中查看 table 时,它不起作用,我得到了 Error 。
数据是 130 个字符(每行),没有分隔符。
有谁知道问题出在哪里,并提供帮助?谢谢
CREATE EXTERNAL TABLE pricefile_edited(
field1 STRING,
field2 STRING,
field3 STRING,
field4 STRING,
field5 STRING,
field6 STRING,
field7 STRING,
field8 STRING,
field9 STRING,
field10 STRING,
field11 STRING,
field12 STRING,
field13 STRING,
field14 STRING,
field15 STRING,
field16 STRING,
field17 STRING,
field18 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" =
"(\.{12})(\.{1})(\.{1})(\.{24})(\.{6})(\.{6})(\.{13})(\.{6})(\.{1})(\.{4})(\.{1})(\.{3})(\.{17})(\.{9})(\.{12})(\.{1})(\.{1})(\.
{12})")
LOCATION '/user/hive/warehouse';
我收到这个错误:
Bad status for request TFetchResultsReq(fetchType=0,
operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,
operationId=THandleIdentifier(secret='\xc3\xd7\x97\xd3coB\xa1\x90P\x9e\xab\x82\xa4\xf4A',
guid='\x80\xa1\x93\xe2\x10\xefJ\xd9\xa3\xa3\xdb\x1f\x95\x85\x88\xb3')),
orientation=4, maxRows=100):
TFetchResultsResp(status=TStatus(errorCode=0,
errorMessage='java.io.IOException: java.io.IOException: Not a file:
hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories',
sqlState=None,
infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException:
java.io.IOException: Not a file:
hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:25:24',
'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:463',
'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:294',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:769',
'sun.reflect.GeneratedMethodAccessor20:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:498',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access[=14=]0:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:422',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1917',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy21:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:694',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624',
'java.lang.Thread:run:Thread.java:748',
'*java.io.IOException:java.io.IOException: Not a file:
hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:29:4',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:508',
'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:415',
'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140',
'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2069',
'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:458',
'*java.io.IOException:Not a file:
hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:32:3',
'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat.java:322',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'],
statusCode=3), results=None, hasMoreRows=None)
Table 位置似乎不对:/user/hive/warehouse
- 这看起来像默认仓库目录。里面有一些目录。它在 /user/hive/warehouse/categories
上失败,表示这不是文件。看起来这是类别 table 目录。
在/user/hive/warehouse 目录中创建一个文件夹并将文件放入其中。像这样:
/user/hive/warehouse/pricefiles/pricefile_edited.txt
更改 table 在 DDL 中的位置:
LOCATION '/user/hive/warehouse/pricefiles
正则表达式不正确。每列应该在正则表达式 (in parenthesis)
中有相应的组。例如,第一列的正则表达式表示它是 12 个点 .
因为 \.
字面意思是点字符。如果你想要任何 12 个字符,它应该是 (.{12}) 没有两个斜线。还要在组之间添加分隔符(space 或制表符或其他):(.{12})(.{1}) - 这将从 140219078921B0 (140219078921) 和 B
中提取 12 个字符作为第二列。相应地修复您的正则表达式,并在必要时在组之间添加 spaces(delimiters)。还要从正则表达式中删除多余的输入,将其写为单行。
您可以使用 regexp_extract(string, regexp, group_number)
以简单的方式测试正则表达式:
hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})',1); --extract group number 1 (group 0 is the whole regexp)
OK
140219078921
Time taken: 1.057 seconds, Fetched: 1 row(s)
hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})(.{1})',2); --extract group number 2
OK
B
Time taken: 0.441 seconds, Fetched: 1 row(s)
等等。多加群仔细测试
我在 Hive 中使用 Regex Serde 创建了一个 table。在 Hue 中,return 成功创建了 table。但是,当我尝试 return table SELECT * FROM pricefile_edited
或在 hue 中查看 table 时,它不起作用,我得到了 Error 。
数据是 130 个字符(每行),没有分隔符。
有谁知道问题出在哪里,并提供帮助?谢谢
CREATE EXTERNAL TABLE pricefile_edited(
field1 STRING,
field2 STRING,
field3 STRING,
field4 STRING,
field5 STRING,
field6 STRING,
field7 STRING,
field8 STRING,
field9 STRING,
field10 STRING,
field11 STRING,
field12 STRING,
field13 STRING,
field14 STRING,
field15 STRING,
field16 STRING,
field17 STRING,
field18 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" =
"(\.{12})(\.{1})(\.{1})(\.{24})(\.{6})(\.{6})(\.{13})(\.{6})(\.{1})(\.{4})(\.{1})(\.{3})(\.{17})(\.{9})(\.{12})(\.{1})(\.{1})(\.
{12})")
LOCATION '/user/hive/warehouse';
我收到这个错误:
Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xc3\xd7\x97\xd3coB\xa1\x90P\x9e\xab\x82\xa4\xf4A', guid='\x80\xa1\x93\xe2\x10\xefJ\xd9\xa3\xa3\xdb\x1f\x95\x85\x88\xb3')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:463', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:294', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:769', 'sun.reflect.GeneratedMethodAccessor20:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access[=14=]0:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1917', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy21:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:694', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:29:4', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:508', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:415', 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140', 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2069', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:458', '*java.io.IOException:Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:32:3', 'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat.java:322', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), results=None, hasMoreRows=None)
Table 位置似乎不对:/user/hive/warehouse
- 这看起来像默认仓库目录。里面有一些目录。它在 /user/hive/warehouse/categories
上失败,表示这不是文件。看起来这是类别 table 目录。
在/user/hive/warehouse 目录中创建一个文件夹并将文件放入其中。像这样:
/user/hive/warehouse/pricefiles/pricefile_edited.txt
更改 table 在 DDL 中的位置:
LOCATION '/user/hive/warehouse/pricefiles
正则表达式不正确。每列应该在正则表达式 (in parenthesis)
中有相应的组。例如,第一列的正则表达式表示它是 12 个点 .
因为 \.
字面意思是点字符。如果你想要任何 12 个字符,它应该是 (.{12}) 没有两个斜线。还要在组之间添加分隔符(space 或制表符或其他):(.{12})(.{1}) - 这将从 140219078921B0 (140219078921) 和 B
中提取 12 个字符作为第二列。相应地修复您的正则表达式,并在必要时在组之间添加 spaces(delimiters)。还要从正则表达式中删除多余的输入,将其写为单行。
您可以使用 regexp_extract(string, regexp, group_number)
以简单的方式测试正则表达式:
hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})',1); --extract group number 1 (group 0 is the whole regexp)
OK
140219078921
Time taken: 1.057 seconds, Fetched: 1 row(s)
hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})(.{1})',2); --extract group number 2
OK
B
Time taken: 0.441 seconds, Fetched: 1 row(s)
等等。多加群仔细测试