Cloudera Twiiter Hive 查询失败

Question

团队，

想知道是否有人成功执行了对 Twitter Cloudera 示例的查询？

我在 Beewax 文件资源中添加了提到的 SerDe Jar 作为 Jar，但我仍然收到任何查询的错误。

查询：

SELECT
t.retweeted_screen_name,
sum(retweets) AS total_retweets,
count(*) AS tweet_count
FROM (SELECT
retweeted_status.user.screen_name as retweeted_screen_name,
retweeted_status.text,
max(retweet_count) as retweets
FROM tweets
GROUP BY retweeted_status.user.screen_name,
retweeted_status.text) t
GROUP BY t.retweeted_screen_name
ORDER BY total_retweets DESC
LIMIT 10;

您的查询有以下错误：

处理语句时出错：失败：执行错误，return 来自 org.apache.hadoop.hive.ql.exec.mr.MapRedTask

的代码 2

INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : number of splits:1
INFO : Submitting tokens for job: job_1432914212475_0002
INFO : The url to track the job: http://quickstart.cloudera:8088/proxy/application_1432914212475_0002/
INFO : Starting Job = job_1432914212475_0002, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1432914212475_0002/
INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1432914212475_0002
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2015-05-29 10:20:59,400 Stage-1 map = 0%, reduce = 0%
INFO : 2015-05-29 10:21:35,687 Stage-1 map = 100%, reduce = 100%
ERROR : Ended Job = job_1432914212475_0002 with errors

Answer 1

已解决！

不要使用预构建的 SerDe Jar 下载。它可能已过时。

自己编译！

Cloudera Twiiter Hive 查询失败

Cloudera Twiiter Hive Query failure

twitter

cloudera

hadoop-streaming