带空格的 ElasticSearch 部分映射
ElasticSearch Partial Mappings with Spaces
在涉及 space 之前,我的部分映射和查询工作得很好,例如,术语 Jon Doe 将其术语向量分解为 ..
"terms": {
"j": {
"term_freq": 1
},
"jo": {
"term_freq": 1
},
"jon": {
"term_freq": 1
},
"d": {
"term_freq": 1
},
"do": {
"term_freq": 1
},
"doe": {
"term_freq": 1
}
}
但我希望它是..
"terms": {
"j": {
"term_freq": 1
},
"jo": {
"term_freq": 1
},
"jon": {
"term_freq": 1
},
"jon ": {
"term_freq": 1
},
"jon d": {
"term_freq": 1
},
"jon do": {
"term_freq": 1
},
"jon doe": {
"term_freq": 1
}
}
这是我的映射和设置:
映射:
name: {
type: 'string',
term_vector: 'yes',
analyzer: 'ngram_analyzer',
search_analyzer: 'standard',
include_in_all: true
}
设置:
settings: {
index: {
analysis: {
filter: {
ngram_filter: {
type: 'edge_ngram',
min_gram: 1,
max_gram: 15
}
},
analyzer: {
'ngram_analyzer': {
filter: [
'lowercase',
'ngram_filter'
],
type: 'custom',
tokenizer: 'standard'
}
}
},
number_of_shards: 1,
number_of_replicas: 1
}
}
};
我该怎么做?
您只需在自定义分析器中使用不同的分词器:
"analyzer": {
"ngram_analyzer": {
"filter": [
"lowercase",
"ngram_filter"
],
"type": "custom",
"tokenizer": "keyword"
}
}
在涉及 space 之前,我的部分映射和查询工作得很好,例如,术语 Jon Doe 将其术语向量分解为 ..
"terms": {
"j": {
"term_freq": 1
},
"jo": {
"term_freq": 1
},
"jon": {
"term_freq": 1
},
"d": {
"term_freq": 1
},
"do": {
"term_freq": 1
},
"doe": {
"term_freq": 1
}
}
但我希望它是..
"terms": {
"j": {
"term_freq": 1
},
"jo": {
"term_freq": 1
},
"jon": {
"term_freq": 1
},
"jon ": {
"term_freq": 1
},
"jon d": {
"term_freq": 1
},
"jon do": {
"term_freq": 1
},
"jon doe": {
"term_freq": 1
}
}
这是我的映射和设置:
映射:
name: {
type: 'string',
term_vector: 'yes',
analyzer: 'ngram_analyzer',
search_analyzer: 'standard',
include_in_all: true
}
设置:
settings: {
index: {
analysis: {
filter: {
ngram_filter: {
type: 'edge_ngram',
min_gram: 1,
max_gram: 15
}
},
analyzer: {
'ngram_analyzer': {
filter: [
'lowercase',
'ngram_filter'
],
type: 'custom',
tokenizer: 'standard'
}
}
},
number_of_shards: 1,
number_of_replicas: 1
}
}
};
我该怎么做?
您只需在自定义分析器中使用不同的分词器:
"analyzer": {
"ngram_analyzer": {
"filter": [
"lowercase",
"ngram_filter"
],
"type": "custom",
"tokenizer": "keyword"
}
}