在 Hive 查询中定义特定字段时出错

Error at defining a specific field in a Hive Query

我有一个 Orion Context Broker 通过 cygnus 连接到 Cosmos。

它工作正常,我的意思是我将新元素发送到 Context Broker,cygnus 将它们发送到 Cosmos 并将它们保存在文件中。

我遇到的问题是当我尝试进行一些搜索时。

我启动 hive,我看到创建了一些与 cosmos 创建的文件相关的表,所以我启动了一些查询。

简单的工作正常:

select * from Table_name;

Hive 不启动任何 mapReduce 作业。

但是当我只想过滤、加入、计数或获取某些字段时。事情就是这样:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = JOB_NAME, Tracking URL = JOB_DETAILS_URL
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job  -kill JOB_NAME
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-07-08 14:35:12,723 Stage-1 map = 0%,  reduce = 0%
2015-07-08 14:35:38,943 Stage-1 map = 100%,  reduce = 100%
Ended Job = JOB_NAME with errors
Error during job, obtaining debugging information...
Examining task ID: TASK_NAME (and more) from job JOB_NAME

Task with the most failures(4): 
-----
Task ID:
  task_201409031055_6337_m_000000

URL: TASK_DETAIL_URL
-----

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL

我发现 Cygnus 创建的文件与其他文件不同,因为在 cygnus 的情况下,它们必须用 jar 反序列化。

所以,我怀疑在那些情况下我是否必须应用任何 MapReduce 方法,或者是否已经有任何通用方法来执行此操作。

在执行任何 Hive 语句之前,执行以下操作:

hive> add jar /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar;

如果您通过 JDBC 使用 Hive,执行与任何其他语句一样:

Connection con = ...
Statement stmt = con.createStatement();
stmt.executeQuery("add jar /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar");
stmt.close();
stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("select ...");