Apache nutch 注入 url
Apache nutch inject urls
我是 Apache Nutch(2.3.1) 和 mongodb(3.4.7) 的新手。安装步骤后,我想注入 url 并抓取维基百科网站。当我在终端中 运行 "./nutch inject urls" 时,我遇到了这个错误。
~/apache-nutch-2.3.1/runtime/local/bin$ ./nutch inject urls
InjectorJob: starting at 2017-11-26 19:07:35
InjectorJob: Injecting urlDir: urls
InjectorJob: org.apache.gora.util.GoraException: java.lang.NullPointerException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964)
at org.apache.gora.mongodb.store.MongoStore.getDB(MongoStore.java:192)
at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:122)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 7 more
实际上我在 $NUTCH_HOME/conf/gora.properties 文件中设置了错误的 Mongo 数据库名称。修复后,Apache nutch 正常工作。
我是 Apache Nutch(2.3.1) 和 mongodb(3.4.7) 的新手。安装步骤后,我想注入 url 并抓取维基百科网站。当我在终端中 运行 "./nutch inject urls" 时,我遇到了这个错误。
~/apache-nutch-2.3.1/runtime/local/bin$ ./nutch inject urls
InjectorJob: starting at 2017-11-26 19:07:35
InjectorJob: Injecting urlDir: urls
InjectorJob: org.apache.gora.util.GoraException: java.lang.NullPointerException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964)
at org.apache.gora.mongodb.store.MongoStore.getDB(MongoStore.java:192)
at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:122)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 7 more
实际上我在 $NUTCH_HOME/conf/gora.properties 文件中设置了错误的 Mongo 数据库名称。修复后,Apache nutch 正常工作。