问题 运行 使用 storm-crawler 2.3-SNAPSHOT 的示例拓扑

Problem running example topology with storm-crawler 2.3-SNAPSHOT

我正在从源代码构建 SC 2.3-SNAPSHOT 并从原型生成项目。然后我尝试 运行 示例 Flux 拓扑。种子注入正确。我可以在 ES 索引中看到所有这些,状态为 DISCOVERED。我的问题是注入后似乎没有提取,所以我正在寻找要调查的内容的想法。 所有风暴组件看起来都很好,ES 也是如此。 在日志中,我可以看到我的单身工人出现这种错误:

2022-02-28 08:41:48.852 c.d.s.e.p.AggregationSpout I/O dispatcher 13 [ERROR] [spout #2]  Exception with ES query
java.io.IOException: Unable to parse response body for Response{requestLine=POST /status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_red
uce_size=512&preference=_shards%3A2%7C_local HTTP/1.1, host=http://node-1:9200, response=HTTP/1.1 200 OK}
        at org.elasticsearch.client.RestHighLevelClient.onSuccess(RestHighLevelClient.java:2351) [stormjar.jar:?]
        at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:660) [stormjar.jar:?]
        at org.elasticsearch.client.RestClient.completed(RestClient.java:394) [stormjar.jar:?]
        at org.elasticsearch.client.RestClient.completed(RestClient.java:388) [stormjar.jar:?]
        at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122) [stormjar.jar:?]
        at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) [stormjar.jar:?]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448) [stormjar.jar:?]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338) [stormjar.jar:?]
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) [stormjar.jar:?]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) [stormjar.jar:?]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) [stormjar.jar:?]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) [stormjar.jar:?]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) [stormjar.jar:?]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) [stormjar.jar:?]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) [stormjar.jar:?]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) [stormjar.jar:?]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [stormjar.jar:?]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) [stormjar.jar:?]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
Caused by: java.lang.NullPointerException
        at com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout$InProcessMap.containsKey(AbstractQueryingSpout.java:158) ~[stormjar.jar:?]
        at com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout.onResponse(AggregationSpout.java:252) ~[stormjar.jar:?]
        at com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout.onResponse(AggregationSpout.java:63) [stormjar.jar:?]
        at org.elasticsearch.client.RestHighLevelClient.onSuccess(RestHighLevelClient.java:2349) [stormjar.jar:?]
        ... 18 more

这是最近在 https://github.com/DigitalPebble/storm-crawler/commit/88784c1af9a35fd45df3b68ace279a0b73e1e856

中修复的

git pull and mvn clean install StormCrawler再重建拓扑

关于

"WARN o.a.s.u.Utils - Topology crawler contains unreachable components "__system" What does it refer to"

不知道,但这应该不是什么大问题。