为什么 Hibernate MassIndexer 说索引已完成但实际上还没有完成
Why Hibernate MassIndexer says indexing completed but really it isn't completed
我正在尝试使用 MassIndexer 在弹性搜索中为大数据(与 7-8 个表关联的 1350 万条记录)建立索引。它显示消息它索引了 39.08% 之后的所有记录。我在本地和生产中遇到了同样的问题,每次执行的百分比都不同。
fullTextEntityManager
.createIndexer(XYZ.class)
.batchSizeToLoadObjects(500).cacheMode(CacheMode.IGNORE).threadsToLoadObjects(2).idFetchSize(Integer.MIN_VALUE)
.startAndWait();
日志:
23:05:25,338 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:90 - HSEARCH000031: Indexing speed: 1085.105591 documents/second; progress: 39.08%
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:87 - HSEARCH000030: 5322450 documents indexed in 4904960 ms
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:90 - HSEARCH000031: Indexing speed: 1085.115845 documents/second; progress: 39.08%
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:87 - HSEARCH000030: 5322500 documents indexed in 4904961 ms
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:90 - HSEARCH000031: Indexing speed: 1085.125854 documents/second; progress: 39.08%
23:05:36,103 (Hibernate Search: Elasticsearch transport thread-3) DEBUG request:194 - HSEARCH400082: Executed Elasticsearch HTTP POST request to path '/xyz/_forcemerge' with query parameters {} in 16734ms. Response had status 200 'OK'.
23:05:37,666 (Hibernate Search: Elasticsearch transport thread-3) DEBUG request:194 - HSEARCH400082: Executed Elasticsearch HTTP POST request to path '/xyz/_flush' with query parameters {} in 1562ms. Response had status 200 'OK'.
23:05:37,668 (Hibernate Search: Elasticsearch transport thread-3) DEBUG request:194 - HSEARCH400082: Executed Elasticsearch HTTP POST request to path '/xyz/_refresh' with query parameters {} in 1ms. Response had status 200 'OK'.
23:05:37,668 (main) INFO SimpleIndexingProgressMonitor:78 - HSEARCH000028: Reindexed 13618954 entities
只有在索引完所有记录后才应该显示索引完成。
这看起来很像 HSEARCH-3462,它已在 6.0.0.Alpha2 中得到修复,但没有移植到 5.11。
长话短说:这是一个日志记录问题,而不是索引问题。最后一行声明一切都已重新索引是您应该相信的。
我会看看我们是否可以轻松地将修复程序反向移植到 5.10/5.11,但是我们可能需要一些时间才能再次发布这些分支。 Backport 票(如果您需要跟踪进度):https://hibernate.atlassian.net/browse/HSEARCH-3622
Your log 清楚地表明在质量索引过程中存在错误,这在您最初的 post.
中没有提及
您会定期遇到这样的错误:
10:48:28,125 (Hibernate Search: Elasticsearch transport thread-2) ERROR LogErrorHandler:71 - HSEARCH000058: Exception occurred org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
Subsequent failures:
Entity com.example.model.XXXXXX Id 855665929073643520 Work Type org.hibernate.search.backend.AddLuceneWork
org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
at org.hibernate.search.elasticsearch.work.impl.BulkWork.lambda$execute(BulkWork.java:77)
at org.hibernate.search.util.impl.Futures.lambda$handler(Futures.java:57)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.hibernate.search.elasticsearch.client.impl.DefaultElasticsearchClient.onFailure(DefaultElasticsearchClient.java:123)
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:605)
at org.elasticsearch.client.RestClient.retryIfPossible(RestClient.java:396)
at org.elasticsearch.client.RestClient.failed(RestClient.java:375)
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134)
at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492)
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException
... 11 more
虽然本质上意味着一些索引请求由于 Elasticsearch 回答时间太长而失败。
可能有很多原因。
您的 Hibernate Search 配置看起来非常保守(只有两个线程),所以我认为您不会给 Elasticsearch 集群带来太大压力。
我建议仔细检查您的 Elasticsearch 设置(Elasticsearch 文档可能提供了一些可以提供帮助的注意事项)。
检查您是否拥有大小合适的 Elasticsearch 集群,服务器尺寸是否合适,...
您可能还想调整与 Elasticsearch 集群通信相关的 hibernate.search
配置属性:超时、连接数……参见 https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#elasticsearch-integration-configuration
我正在尝试使用 MassIndexer 在弹性搜索中为大数据(与 7-8 个表关联的 1350 万条记录)建立索引。它显示消息它索引了 39.08% 之后的所有记录。我在本地和生产中遇到了同样的问题,每次执行的百分比都不同。
fullTextEntityManager
.createIndexer(XYZ.class)
.batchSizeToLoadObjects(500).cacheMode(CacheMode.IGNORE).threadsToLoadObjects(2).idFetchSize(Integer.MIN_VALUE)
.startAndWait();
日志:
23:05:25,338 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:90 - HSEARCH000031: Indexing speed: 1085.105591 documents/second; progress: 39.08%
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:87 - HSEARCH000030: 5322450 documents indexed in 4904960 ms
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:90 - HSEARCH000031: Indexing speed: 1085.115845 documents/second; progress: 39.08%
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:87 - HSEARCH000030: 5322500 documents indexed in 4904961 ms
23:05:25,339 (Hibernate Search: Elasticsearch transport thread-2) INFO SimpleIndexingProgressMonitor:90 - HSEARCH000031: Indexing speed: 1085.125854 documents/second; progress: 39.08%
23:05:36,103 (Hibernate Search: Elasticsearch transport thread-3) DEBUG request:194 - HSEARCH400082: Executed Elasticsearch HTTP POST request to path '/xyz/_forcemerge' with query parameters {} in 16734ms. Response had status 200 'OK'.
23:05:37,666 (Hibernate Search: Elasticsearch transport thread-3) DEBUG request:194 - HSEARCH400082: Executed Elasticsearch HTTP POST request to path '/xyz/_flush' with query parameters {} in 1562ms. Response had status 200 'OK'.
23:05:37,668 (Hibernate Search: Elasticsearch transport thread-3) DEBUG request:194 - HSEARCH400082: Executed Elasticsearch HTTP POST request to path '/xyz/_refresh' with query parameters {} in 1ms. Response had status 200 'OK'.
23:05:37,668 (main) INFO SimpleIndexingProgressMonitor:78 - HSEARCH000028: Reindexed 13618954 entities
只有在索引完所有记录后才应该显示索引完成。
这看起来很像 HSEARCH-3462,它已在 6.0.0.Alpha2 中得到修复,但没有移植到 5.11。
长话短说:这是一个日志记录问题,而不是索引问题。最后一行声明一切都已重新索引是您应该相信的。
我会看看我们是否可以轻松地将修复程序反向移植到 5.10/5.11,但是我们可能需要一些时间才能再次发布这些分支。 Backport 票(如果您需要跟踪进度):https://hibernate.atlassian.net/browse/HSEARCH-3622
Your log 清楚地表明在质量索引过程中存在错误,这在您最初的 post.
中没有提及您会定期遇到这样的错误:
10:48:28,125 (Hibernate Search: Elasticsearch transport thread-2) ERROR LogErrorHandler:71 - HSEARCH000058: Exception occurred org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
Subsequent failures:
Entity com.example.model.XXXXXX Id 855665929073643520 Work Type org.hibernate.search.backend.AddLuceneWork
org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
at org.hibernate.search.elasticsearch.work.impl.BulkWork.lambda$execute(BulkWork.java:77)
at org.hibernate.search.util.impl.Futures.lambda$handler(Futures.java:57)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.hibernate.search.elasticsearch.client.impl.DefaultElasticsearchClient.onFailure(DefaultElasticsearchClient.java:123)
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:605)
at org.elasticsearch.client.RestClient.retryIfPossible(RestClient.java:396)
at org.elasticsearch.client.RestClient.failed(RestClient.java:375)
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134)
at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492)
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException
... 11 more
虽然本质上意味着一些索引请求由于 Elasticsearch 回答时间太长而失败。
可能有很多原因。
您的 Hibernate Search 配置看起来非常保守(只有两个线程),所以我认为您不会给 Elasticsearch 集群带来太大压力。
我建议仔细检查您的 Elasticsearch 设置(Elasticsearch 文档可能提供了一些可以提供帮助的注意事项)。 检查您是否拥有大小合适的 Elasticsearch 集群,服务器尺寸是否合适,...
您可能还想调整与 Elasticsearch 集群通信相关的 hibernate.search
配置属性:超时、连接数……参见 https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#elasticsearch-integration-configuration