Elasticsearch:获取用于从客户端索引给定字段的分析器

Elasticsearch: Getting analyzer used for indexing a given field from the client side

有没有办法以编程方式获取 analyzer 用于通过客户端为 Elasticsearch 服务器实例索引给定字段(当然,假设分析器在双方都可用)?

例如,使用如下映射:

{
    "mappings": {
        "article": {
            "properties": {
                "text": {
                    "type": "string",
                    "index": "analyzed",
                    "analyzer": "spanish"
                }
            }
        }
    }
}

怎样才能得到org.apache.lucene.analysis.es.SpanishAnalyzer for the field text using the Java client for Elasticsearch,如下图?

import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.Collections;

import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;

public class QueryAnalyzerTest {

    public static void main(final String[] args) throws UnknownHostException {
        final String docTextFieldName = "text";
        Iterable<SearchHit> hits = Collections.emptyList();

        try (final Client client = TransportClient.builder().build()
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300))) {
            final QueryBuilder queryBuilder = QueryBuilders.matchQuery(docTextFieldName, "anuncio");
            final SearchRequestBuilder searchRequestBuilder = client.prepareSearch("news").setQuery(queryBuilder)
                    .setTypes("article");
            final SearchResponse response = searchRequestBuilder.get();
            hits = response.getHits();
        }

        hits.forEach(hit -> {
            final String docText = (String) hit.getSource().get(docTextFieldName);
            // TODO: Tokenize "docText" with the exact same tokenizer used when
            // indexing the field
        });

    }

}

您绝对可以使用 client().admin().indices().prepareGetFieldMappings("indexName") 以编程方式获取 text 字段的映射,并且您将能够检索分析器的逻辑名称(即 "spanish"),但是,您不会获得分析器的 class 名称。

为此您需要调用 AnalysisRegistry.getAnalyzer("spanish"),您将获得正确的分析器实例。