分析在 ElasticSearch 中被索引的术语

Question

所以我有一个自定义的分析器，它添加了 ontology 中的附加术语。此外，我想在术语被索引之前对其进行词干提取。以下是从 elasticsearch head 插件 中获取的 index metadata。

{
    "state": "open",
    "settings": {
        "index": {
            "refresh_interval": "1000s",
            "number_of_shards": "5",
            "creation_date": "1471931611750",
            "analysis": {
                "filter": {
                    "owlfilter": {
                        "type": "owl",
                        "indexName": "ontoowl",
                        "expansionType": "RDFSLABEL",
                        "owlFile": "/home/tannys/elasticsearch-2.3.0/ontologyWorkTrial/myownowl.owl"
                    }
                },
                "analyzer": {
                    "owlanalyzer": {
                        "filter": ["owlfilter","porter_stem"],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                }
            },
            "number_of_replicas": "1",
            "uuid": "d8Ub8A0eSm65geMK_bpdvw",
            "version": {"created": "2030099"}
        }
    },
    "mappings": {
        "mytype": {
            "properties": {
                "nameortitle": {
                    "search_analyzer": "standard",
                    "analyzer": "owlanalyzer",
                    "store": true,
                    "type": "string"
                },
                "description": {
                    "search_analyzer": "standard",
                    "analyzer": "owlanalyzer",
                    "store": true,
                    "type": "string"
                }
            },
            "aliases": [ ]
        }
    }
}

具有讽刺意味的是，在我使用 porter_stem 过滤器之前，结果更好。所以我不太确定，出了什么问题。我想查看正在编入索引的术语。我如何查看分析器的性能，比如 luke 为 Lucene 做了什么？任何指导。

Answer 1

您可以使用术语向量 API here. That would give you the terms for a field in a document or you can also use multi-term API 以相同的方式查看来自多个文档的术语。

分析在 ElasticSearch 中被索引的术语

Analyze terms which get indexed in ElasticSearch

analyzer

luke

elasticsearch