Apache Nutch Indexer Plugin to Manticore Search Exception: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
Apache Nutch Indexer Plugin to Manticore Search Exception: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
我创建了一个 Apache Nutch 索引器插件以使用 Manticore 搜索将数据推送到 Manticore 搜索 Java API。
构建成功,索引前的所有爬取步骤都成功(注入、生成、获取、解析、更新b)。
当我 运行 索引命令 bin/nutch index /root/nutch_source/crawl/crawldb/ -linkdb /root/nutch_source/crawl/linkdb/ -dir /root/nutch_source/crawl/segments/ -filter -normalize -deleteGone
它失败并且 logs/hadoop.log 包含以下堆栈跟踪。
我正在 运行将 Nutch 放入 Docker 容器中。
图中的Nutch版本为1.19
2021-09-07 10:15:46,040 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:16:23,666 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:17:36,020 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:17:36,378 INFO segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906001900.
2021-09-07 10:17:36,383 INFO segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906001655.
2021-09-07 10:17:36,387 INFO segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906002358.
2021-09-07 10:17:36,391 INFO indexer.IndexingJob - Indexer: starting at 2021-09-07 10:17:36
2021-09-07 10:17:36,401 INFO indexer.IndexingJob - Indexer: deleting gone documents: true
2021-09-07 10:17:36,402 INFO indexer.IndexingJob - Indexer: URL filtering: true
2021-09-07 10:17:36,402 INFO indexer.IndexingJob - Indexer: URL normalizing: true
2021-09-07 10:17:36,403 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: /root/nutch_source/crawl/crawldb
2021-09-07 10:17:36,407 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906001900
2021-09-07 10:17:36,408 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906001655
2021-09-07 10:17:36,410 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906002358
2021-09-07 10:17:36,411 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: /root/nutch_source/crawl/linkdb
2021-09-07 10:17:36,528 WARN impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
2021-09-07 10:17:37,708 INFO mapreduce.Job - The url to track the job: http://localhost:8080/
2021-09-07 10:17:37,711 INFO mapreduce.Job - Running job: job_local250243852_0001
2021-09-07 10:17:38,724 INFO mapreduce.Job - Job job_local250243852_0001 running in uber mode : false
2021-09-07 10:17:38,725 INFO mapreduce.Job - map 0% reduce 0%
2021-09-07 10:17:39,731 INFO mapreduce.Job - map 100% reduce 0%
2021-09-07 10:17:47,677 WARN impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-09-07 10:17:47,992 INFO indexer.IndexWriters - Index writer org.apache.nutch.indexwriter.manticore.ManticoreIndexWriter identified.
2021-09-07 10:17:48,013 WARN mapred.LocalJobRunner - job_local250243852_0001
java.lang.Exception: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
Caused by: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
at java.base/java.lang.Class.getConstructor0(Class.java:3342)
at java.base/java.lang.Class.getConstructor(Class.java:2151)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170)
at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:97)
at org.apache.nutch.indexer.IndexWriters.lambda$get[=10=](IndexWriters.java:60)
at java.base/java.util.Map.computeIfAbsent(Map.java:1003)
at org.apache.nutch.indexer.IndexWriters.get(IndexWriters.java:60)
at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: com.manticoresearch.client.ApiException
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.nutch.plugin.PluginClassLoader.loadClassFromSystem(PluginClassLoader.java:105)
at org.apache.nutch.plugin.PluginClassLoader.loadClassFromParent(PluginClassLoader.java:93)
at org.apache.nutch.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:73)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 19 more
2021-09-07 10:17:48,742 INFO mapreduce.Job - Job job_local250243852_0001 failed with state FAILED due to: NA
2021-09-07 10:17:48,773 INFO mapreduce.Job - Counters: 30
File System Counters
FILE: Number of bytes read=157397439
FILE: Number of bytes written=332518016
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=51223
Map output records=51223
Map output bytes=24049558
Map output materialized bytes=24158915
Input split bytes=2010
Combine input records=0
Combine output records=0
Reduce input groups=0
Input split bytes=2010
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=24158915
Reduce input records=0
Reduce output records=0
Spilled Records=51223
Shuffled Maps =14
Failed Shuffles=0
Merged Map outputs=14
GC time elapsed (ms)=125
Total committed heap usage (bytes)=5221908480
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=11426452
File Output Format Counters
Bytes Written=0
2021-09-07 10:17:48,774 ERROR indexer.IndexingJob - Indexing job did not succeed, job status:FAILED, reason: NA
2021-09-07 10:17:48,776 ERROR indexer.IndexingJob - Indexer: java.lang.RuntimeException: Indexing job did not succeed, job status:FAILED, reason: NA
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:152)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:293)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:302)
我可以通过将 ManticoreSearch 的所有依赖库添加到插件文件夹内的插件清单 plugin.xml
文件来解决这个问题。
我找到了文件夹 runtime/local/plugins/<plugin-name>/
中列出的所有依赖 JAR 库,并将其命名并包含在 plugin.xml
的 <runtime>
标签下。
重建解决方案后,索引器工作正常!
我创建了一个 Apache Nutch 索引器插件以使用 Manticore 搜索将数据推送到 Manticore 搜索 Java API。
构建成功,索引前的所有爬取步骤都成功(注入、生成、获取、解析、更新b)。
当我 运行 索引命令 bin/nutch index /root/nutch_source/crawl/crawldb/ -linkdb /root/nutch_source/crawl/linkdb/ -dir /root/nutch_source/crawl/segments/ -filter -normalize -deleteGone
它失败并且 logs/hadoop.log 包含以下堆栈跟踪。
我正在 运行将 Nutch 放入 Docker 容器中。
图中的Nutch版本为1.19
2021-09-07 10:15:46,040 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:16:23,666 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:17:36,020 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:17:36,378 INFO segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906001900.
2021-09-07 10:17:36,383 INFO segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906001655.
2021-09-07 10:17:36,387 INFO segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906002358.
2021-09-07 10:17:36,391 INFO indexer.IndexingJob - Indexer: starting at 2021-09-07 10:17:36
2021-09-07 10:17:36,401 INFO indexer.IndexingJob - Indexer: deleting gone documents: true
2021-09-07 10:17:36,402 INFO indexer.IndexingJob - Indexer: URL filtering: true
2021-09-07 10:17:36,402 INFO indexer.IndexingJob - Indexer: URL normalizing: true
2021-09-07 10:17:36,403 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: /root/nutch_source/crawl/crawldb
2021-09-07 10:17:36,407 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906001900
2021-09-07 10:17:36,408 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906001655
2021-09-07 10:17:36,410 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906002358
2021-09-07 10:17:36,411 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: /root/nutch_source/crawl/linkdb
2021-09-07 10:17:36,528 WARN impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
2021-09-07 10:17:37,708 INFO mapreduce.Job - The url to track the job: http://localhost:8080/
2021-09-07 10:17:37,711 INFO mapreduce.Job - Running job: job_local250243852_0001
2021-09-07 10:17:38,724 INFO mapreduce.Job - Job job_local250243852_0001 running in uber mode : false
2021-09-07 10:17:38,725 INFO mapreduce.Job - map 0% reduce 0%
2021-09-07 10:17:39,731 INFO mapreduce.Job - map 100% reduce 0%
2021-09-07 10:17:47,677 WARN impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-09-07 10:17:47,992 INFO indexer.IndexWriters - Index writer org.apache.nutch.indexwriter.manticore.ManticoreIndexWriter identified.
2021-09-07 10:17:48,013 WARN mapred.LocalJobRunner - job_local250243852_0001
java.lang.Exception: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
Caused by: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
at java.base/java.lang.Class.getConstructor0(Class.java:3342)
at java.base/java.lang.Class.getConstructor(Class.java:2151)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170)
at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:97)
at org.apache.nutch.indexer.IndexWriters.lambda$get[=10=](IndexWriters.java:60)
at java.base/java.util.Map.computeIfAbsent(Map.java:1003)
at org.apache.nutch.indexer.IndexWriters.get(IndexWriters.java:60)
at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: com.manticoresearch.client.ApiException
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.nutch.plugin.PluginClassLoader.loadClassFromSystem(PluginClassLoader.java:105)
at org.apache.nutch.plugin.PluginClassLoader.loadClassFromParent(PluginClassLoader.java:93)
at org.apache.nutch.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:73)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 19 more
2021-09-07 10:17:48,742 INFO mapreduce.Job - Job job_local250243852_0001 failed with state FAILED due to: NA
2021-09-07 10:17:48,773 INFO mapreduce.Job - Counters: 30
File System Counters
FILE: Number of bytes read=157397439
FILE: Number of bytes written=332518016
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=51223
Map output records=51223
Map output bytes=24049558
Map output materialized bytes=24158915
Input split bytes=2010
Combine input records=0
Combine output records=0
Reduce input groups=0
Input split bytes=2010
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=24158915
Reduce input records=0
Reduce output records=0
Spilled Records=51223
Shuffled Maps =14
Failed Shuffles=0
Merged Map outputs=14
GC time elapsed (ms)=125
Total committed heap usage (bytes)=5221908480
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=11426452
File Output Format Counters
Bytes Written=0
2021-09-07 10:17:48,774 ERROR indexer.IndexingJob - Indexing job did not succeed, job status:FAILED, reason: NA
2021-09-07 10:17:48,776 ERROR indexer.IndexingJob - Indexer: java.lang.RuntimeException: Indexing job did not succeed, job status:FAILED, reason: NA
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:152)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:293)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:302)
我可以通过将 ManticoreSearch 的所有依赖库添加到插件文件夹内的插件清单 plugin.xml
文件来解决这个问题。
我找到了文件夹 runtime/local/plugins/<plugin-name>/
中列出的所有依赖 JAR 库,并将其命名并包含在 plugin.xml
的 <runtime>
标签下。
重建解决方案后,索引器工作正常!