nutch 生成期间的运行时异常
runtime exception during nutch generate
我正在尝试 运行 第一次和执行
/bin/nutch 生成 -topN 5
我得到以下异常:
GeneratorJob: starting at 2016-02-13 21:01:42
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 5
GeneratorJob: java.lang.RuntimeException: job failed: name=apache-nutch- 2.3.1.jar, jobid=job_local1061440919_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)
这是来自 hadoop.log 的堆栈跟踪:
2016-02-13 21:01:44,541 ERROR mapreduce.GoraRecordReader - Error reading Gora records: null
2016-02-13 21:01:44,557 WARN mapred.LocalJobRunner - job_local1061440919_0001
java.lang.Exception: java.lang.RuntimeException: java.util.NoSuchElementException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.util.NoSuchElementException
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException
at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
at org.apache.gora.memory.store.MemStore.execute(MemStore.java:128)
at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:73)
at org.apache.gora.mapreduce.GoraRecordReader.executeQuery(GoraRecordReader.java:67)
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:109)
... 12 more
我一直在关注这里的教程:https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup 设置 nutch。
我在 Whosebug 和 nutch archives 上看过一些有类似例外的帖子,他们建议我可能 运行我的 /tmp 目录中的磁盘 space 不足但是 /tmp 目录只有大约 8MB 的数据。
除此之外,我对导致此异常的原因一无所知
导致此异常的原因可能是什么?
我正在使用 Nutch 2.3.1 和 HBase 1.1.3 作为数据存储,我运行将其安装在 Ubuntu15.10
谢谢
查看hadoop 的日志,我认为您使用的是MemStore,而不是HBaseStore。你配置了吗gora.properties?
从我的评论中复制:)
1) 您必须配置 gora.properties,
2) Gora 背后的任何东西(Mongo、HBase、Cassandra 等)都没有响应,所以 nutch 需要 "waitForCompletion",所以请确保它已启动并且运行.
确保使用 kill -9 杀死旧的已停用进程和旧的 java nutch 进程,如果找不到它们则重新启动(希望它不会到那个地步。
我正在尝试 运行 第一次和执行
/bin/nutch 生成 -topN 5
我得到以下异常:
GeneratorJob: starting at 2016-02-13 21:01:42
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 5
GeneratorJob: java.lang.RuntimeException: job failed: name=apache-nutch- 2.3.1.jar, jobid=job_local1061440919_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)
这是来自 hadoop.log 的堆栈跟踪:
2016-02-13 21:01:44,541 ERROR mapreduce.GoraRecordReader - Error reading Gora records: null
2016-02-13 21:01:44,557 WARN mapred.LocalJobRunner - job_local1061440919_0001
java.lang.Exception: java.lang.RuntimeException: java.util.NoSuchElementException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.util.NoSuchElementException
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException
at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
at org.apache.gora.memory.store.MemStore.execute(MemStore.java:128)
at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:73)
at org.apache.gora.mapreduce.GoraRecordReader.executeQuery(GoraRecordReader.java:67)
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:109)
... 12 more
我一直在关注这里的教程:https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup 设置 nutch。
我在 Whosebug 和 nutch archives 上看过一些有类似例外的帖子,他们建议我可能 运行我的 /tmp 目录中的磁盘 space 不足但是 /tmp 目录只有大约 8MB 的数据。 除此之外,我对导致此异常的原因一无所知
导致此异常的原因可能是什么?
我正在使用 Nutch 2.3.1 和 HBase 1.1.3 作为数据存储,我运行将其安装在 Ubuntu15.10
谢谢
查看hadoop 的日志,我认为您使用的是MemStore,而不是HBaseStore。你配置了吗gora.properties?
从我的评论中复制:)
1) 您必须配置 gora.properties,
2) Gora 背后的任何东西(Mongo、HBase、Cassandra 等)都没有响应,所以 nutch 需要 "waitForCompletion",所以请确保它已启动并且运行.
确保使用 kill -9 杀死旧的已停用进程和旧的 java nutch 进程,如果找不到它们则重新启动(希望它不会到那个地步。