分析在 ElasticSearch 中被索引的术语
Analyze terms which get indexed in ElasticSearch
所以我有一个自定义的分析器,它添加了 ontology 中的附加术语。此外,我想在术语被索引之前对其进行词干提取。以下是从 elasticsearch head 插件 中获取的 index metadata
。
{
"state": "open",
"settings": {
"index": {
"refresh_interval": "1000s",
"number_of_shards": "5",
"creation_date": "1471931611750",
"analysis": {
"filter": {
"owlfilter": {
"type": "owl",
"indexName": "ontoowl",
"expansionType": "RDFSLABEL",
"owlFile": "/home/tannys/elasticsearch-2.3.0/ontologyWorkTrial/myownowl.owl"
}
},
"analyzer": {
"owlanalyzer": {
"filter": ["owlfilter","porter_stem"],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "d8Ub8A0eSm65geMK_bpdvw",
"version": {"created": "2030099"}
}
},
"mappings": {
"mytype": {
"properties": {
"nameortitle": {
"search_analyzer": "standard",
"analyzer": "owlanalyzer",
"store": true,
"type": "string"
},
"description": {
"search_analyzer": "standard",
"analyzer": "owlanalyzer",
"store": true,
"type": "string"
}
},
"aliases": [ ]
}
}
}
具有讽刺意味的是,在我使用 porter_stem
过滤器之前,结果更好。所以我不太确定,出了什么问题。我想查看正在编入索引的术语。我如何查看分析器的性能,比如 luke 为 Lucene 做了什么?
任何指导。
您可以使用术语向量 API here. That would give you the terms for a field in a document or you can also use multi-term API 以相同的方式查看来自多个文档的术语。
所以我有一个自定义的分析器,它添加了 ontology 中的附加术语。此外,我想在术语被索引之前对其进行词干提取。以下是从 elasticsearch head 插件 中获取的 index metadata
。
{
"state": "open",
"settings": {
"index": {
"refresh_interval": "1000s",
"number_of_shards": "5",
"creation_date": "1471931611750",
"analysis": {
"filter": {
"owlfilter": {
"type": "owl",
"indexName": "ontoowl",
"expansionType": "RDFSLABEL",
"owlFile": "/home/tannys/elasticsearch-2.3.0/ontologyWorkTrial/myownowl.owl"
}
},
"analyzer": {
"owlanalyzer": {
"filter": ["owlfilter","porter_stem"],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "d8Ub8A0eSm65geMK_bpdvw",
"version": {"created": "2030099"}
}
},
"mappings": {
"mytype": {
"properties": {
"nameortitle": {
"search_analyzer": "standard",
"analyzer": "owlanalyzer",
"store": true,
"type": "string"
},
"description": {
"search_analyzer": "standard",
"analyzer": "owlanalyzer",
"store": true,
"type": "string"
}
},
"aliases": [ ]
}
}
}
具有讽刺意味的是,在我使用 porter_stem
过滤器之前,结果更好。所以我不太确定,出了什么问题。我想查看正在编入索引的术语。我如何查看分析器的性能,比如 luke 为 Lucene 做了什么?
任何指导。
您可以使用术语向量 API here. That would give you the terms for a field in a document or you can also use multi-term API 以相同的方式查看来自多个文档的术语。