从 apache nutch 索引到 elasticsearch 时出现问题

Issue when indexing to elasticsearch from apache nutch

我试图从 apache nutch 索引到单节点 ES 集群,但遇到了这个错误。

org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:173) at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:125) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.StreamCorruptedException: Unsupported version: 1 at org.elasticsearch.common.io.ThrowableObjectInputStream.readStreamHeader(ThrowableObjectInputStream.java:46) at java.io.ObjectInputStream.(ObjectInputStream.java:301) at org.elasticsearch.common.io.ThrowableObjectInputStream.(ThrowableObjectInputStream.java:38) at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:170) ... 23 more

通过进一步的研究,我了解到我应该在客户端和 ES 服务器上使用相同的 jvm 版本。参考:http://jontai.me/blog/2013/06/elasticsearch-remotetransportexception-failed-to-deserialize-exception-response-from-stream/

我使用的是 ES 版本 2.3.2,我的 JVM 版本是 "1.8.0_91"。 当我检查/plugins/indexer-elastic/plugin.xml时,指定的版本是1.4.1。我想知道除了降级 ES 集群版本之外,这可能是问题和可能的解决方案。我想继续使用 ES 2.3.2。请帮我解决这个问题。

PS :我用于索引的命令是 bin/nutch index crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20160801174223/

通过进一步研究,我得到了解决方案。由于 nutch 的索引器插件(ES 1.4.1)中的版本不匹配,出现了错误。

一个解决方案是从 https://github.com/apache/nutch/blob/master/ 下载源代码,然后按照 [=19] 中给出的这些说明使用您的弹性服务器版本自定义插件版本=]src/plugin/indexer-elastic/howto_upgrade_es.txt.

  1. Upgrade elasticsearch dependency in src/plugin/indexer-elastic/ivy.xml

  2. Upgrade the Elasticsearch specific dependencies in src/plugin/indexer-elastic/plugin.xml To get the list of dependencies and their versions execute: $ ant -f ./build-ivy.xml
    $ ls lib/

  3. Build from nutch source folder using ant or any other build tool.

然后我们可以在没有这个问题的情况下索引到 ElasticSearch。 干杯:)