我如何告诉 ElasticSearch 多匹配查询我想要存储为字符串的数字字段与数字字符串 return 匹配?
How do I tell an ElasticSearch multi-match query that I want numeric fields, stored as strings, to return matches with numeric strings?
我正在编写一个 Flask 应用程序,我正在使用 elasticsearch。
这里是search.py
:
from flask import current_app
def query_object(index, fields, query, page, per_page, fuzziness=0):
search = current_app.elasticsearch.search(
index=index,
body={'query': {'multi_match': {'query': str(query), 'fields': fields, 'fuzziness': fuzziness, 'lenient': True}},
'from': (page - 1) * per_page, 'size': per_page}
)
ids = [int(hit['_id']) for hit in search['hits']['hits']]
return ids, search['hits']['total']['value']
以下型号已编入索引:
class WishList(db.Model, SearchableMixin):
__searchable__ = ['first_name', 'gender', 'wants', 'needs', 'wear',
'read', 'shoe_size_category', 'shoe_type', 'sheet_size', 'additional_comments', 'time_chosen',
'age', 'shoe_sock_size', 'program_number']
id = db.Column(db.Integer, primary_key=True)
program_number = db.Column(db.String(4))
first_name = db.Column(db.String(20))
age = db.Column(db.String(10))
gender = db.Column(db.String(20))
wants = db.Column(db.String(300))
needs = db.Column(db.String(300))
wear = db.Column(db.String(300))
read = db.Column(db.String(300))
pant_dress_size = db.Column(db.String(20), default='unspecified')
shirt_blouse_size = db.Column(db.String(20), default='unspecified')
jacket_sweater_size = db.Column(db.String(20), default='unspecified')
shoe_sock_size = db.Column(db.String(20), default='unspecified')
shoe_size_category = db.Column(db.String(20), default='unspecified')
shoe_type = db.Column(db.String(50), default='unspecified')
sheet_size = db.Column(db.String(20), default='unspecified')
additional_comments = db.Column(db.Text(), nullable=True, default=None)
time_chosen = db.Column(db.String(40), nullable=True, default=None)
sponsor_id = db.Column(db.Integer, db.ForeignKey(
'user.id'), nullable=True, default=None)
drive_id = db.Column(db.Integer, db.ForeignKey(
'holiday_cheer_drive.id'), nullable=False, default=None)
通过继承 SearchableMixin class 使该模型可搜索,如下所示:
class SearchableMixin(object):
@classmethod
def search_object(cls, fields, expression, page, per_page, fuzziness=0):
ids, total = query_object(
cls.__tablename__, fields, expression, page, per_page, fuzziness=fuzziness)
if total == 0:
return cls.query.filter_by(id=0), 0
when = []
for i in range(len(ids)):
when.append((ids[i], i))
return cls.query.filter(cls.id.in_(ids)).order_by(
db.case(when, value=cls.id)), total
当我当前搜索时,所有字段都是可搜索的并且 return 一个有效的结果,除非我使用数字值进行搜索。
这是一个搜索输出示例,当我告诉 python 将值打印到控制台时,该搜索有效:
Query: bob
Body of search:
{'from': 0,
'query': {'multi_match': {'fields': ['first_name',
'gender',
'wants',
'needs',
'wear',
'read',
'shoe_size_category',
'shoe_type',
'sheet_size',
'additional_comments',
'time_chosen',
'age',
'shoe_sock_size',
'program_number'],
'fuzziness': 0,
'lenient': True,
'query': 'bob'}},
'size': 10}
Python elasticsearch object:
{'took': 27, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': 1.6916759, 'hits': [{'_index': 'wish_list', '_type': '_doc', '_id': '1', '_score': 1.6916759, '_source': {'first_name': 'bob', 'gender': 'male', 'wants': 'bike', 'needs': 'calculator', 'wear': 'hat', 'read': 'book', 'shoe_size_category': "men's", 'shoe_type': 'sneaker', 'sheet_size': 'unspecified', 'additional_comments': 'Likes cheese', 'time_chosen': None, 'age': '5', 'shoe_sock_size': '4', 'program_number': '215', 'mappings': {'properties': {'first_name': {'type': 'text'}, 'gender': {'type':
'text'}, 'wants': {'type': 'text'}, 'needs': {'type': 'text'}, 'wear': {'type': 'text'}, 'read': {'type': 'text'}, 'shoe_size_category': {'type': 'text'}, 'shoe_type': {'type': 'text'}, 'sheet_size': {'type': 'text'}, 'additional_comments': {'type': 'text'}, 'time_chosen': {'type': 'text'}, 'age': {'type': 'text'}, 'shoe_sock_size': {'type': 'text'}, 'program_number': {'type': 'text'}}}}}, {'_index': 'wish_list', '_type': '_doc', '_id': '9', '_score': 1.6916759, '_source': {'first_name': 'bob', 'gender': 'male', 'wants': 'bike', 'needs': 'calculator', 'wear': 'hat', 'read': 'book', 'shoe_size_category': "men's", 'shoe_type': 'sneaker', 'sheet_size': 'unspecified', 'additional_comments': 'Likes cheese', 'time_chosen': None, 'age': 5, 'shoe_sock_size': 4, 'program_number': 215, 'mappings': {'properties': {'first_name': {'type': 'text'}, 'gender': {'type': 'text'}, 'wants': {'type': 'text'}, 'needs': {'type': 'text'}, 'wear': {'type': 'text'}, 'read': {'type': 'text'}, 'shoe_size_category': {'type': 'text'}, 'shoe_type': {'type': 'text'}, 'sheet_size': {'type': 'text'}, 'additional_comments': {'type': 'text'}, 'time_chosen': {'type': 'text'}, 'age': {'type': 'text'}, 'shoe_sock_size': {'type': 'text'}, 'program_number': {'type': 'text'}}}}}]}}
下面是针对同一对象的完全相同的查询,但使用的是数字字符串:
Query: 215
Body of search:
{'from': 0,
'query': {'multi_match': {'fields': ['first_name',
'gender',
'wants',
'needs',
'wear',
'read',
'shoe_size_category',
'shoe_type',
'sheet_size',
'additional_comments',
'time_chosen',
'age',
'shoe_sock_size',
'program_number'],
'fuzziness': 0,
'lenient': True,
'query': '215'}},
'size': 10}
Python elasticsearch object:
{'took': 18, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}
正在向函数中传递一个字符串,数据全部保存为字符串,但似乎存在某种类型错误。在我添加 lenient: True
之前,它抛出了一个错误,说 elasticsearch 无法构建查询。
如果我能理解我将如何使用 elasticsearch REST API,那么我可能会弄清楚如何使用 python。
问题是由于在 numeric
数据类型上使用了 fuzziness
参数,然后使用 lenient
true 使其工作 removes format-based errors, such as providing a text query value for a numeric field, are ignored.
在 this link.
中提到
以下是您在尝试对数字数据类型使用 fuzziness
时遇到的错误。
reason": "Can only use fuzzy queries on keyword and text fields - not
on [age] which is of type [integer]"
当您添加 "lenient" : true
时,会出现上述错误,但不会 return 任何文档。
要让它工作,只需从搜索查询中删除 fuzziness
和 lenient
参数,它应该会工作,因为 Elasticsearch 会自动将有效的 string
转换为 numeric
和反之亦然,如 coerce 文章所述。
使用 REST 展示它的工作示例API
索引定义
{
"mappings": {
"properties": {
"age" :{
"type" : "integer"
}
}
}
}
索引示例文档
{
"age" : "25" --> note use of `""`, sending it as string
}
{
"age" : 28 :- note sending numneric value
}
字符串格式的搜索查询
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "28", --> note string format
"fields": [
"age" --> note you can add more fields
]
}
}
]
}
}
}
搜索结果
"hits": [
{
"_index": "so_numberic",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"program_number": "123456789",
"age": "28"
}
}
]
以数字格式搜索查询
{
"query": {
"match" : { --> query on single field.
"age" : {
"query" : 28 --> note numeric format
}
}
}
}
结果
"hits": [
{
"_index": "so_numberic",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"program_number": "123456789",
"age": "28"
}
}
]
显示您的 fuzziness
和 lenient
并没有像之前解释的那样带来任何结果。
搜索查询
{
"query": {
"match": {
"age": {
"query": 28,
"fuzziness": 2,
"lenient": true
}
}
}
}
结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": { --> note 0 results.
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
我正在编写一个 Flask 应用程序,我正在使用 elasticsearch。
这里是search.py
:
from flask import current_app
def query_object(index, fields, query, page, per_page, fuzziness=0):
search = current_app.elasticsearch.search(
index=index,
body={'query': {'multi_match': {'query': str(query), 'fields': fields, 'fuzziness': fuzziness, 'lenient': True}},
'from': (page - 1) * per_page, 'size': per_page}
)
ids = [int(hit['_id']) for hit in search['hits']['hits']]
return ids, search['hits']['total']['value']
以下型号已编入索引:
class WishList(db.Model, SearchableMixin):
__searchable__ = ['first_name', 'gender', 'wants', 'needs', 'wear',
'read', 'shoe_size_category', 'shoe_type', 'sheet_size', 'additional_comments', 'time_chosen',
'age', 'shoe_sock_size', 'program_number']
id = db.Column(db.Integer, primary_key=True)
program_number = db.Column(db.String(4))
first_name = db.Column(db.String(20))
age = db.Column(db.String(10))
gender = db.Column(db.String(20))
wants = db.Column(db.String(300))
needs = db.Column(db.String(300))
wear = db.Column(db.String(300))
read = db.Column(db.String(300))
pant_dress_size = db.Column(db.String(20), default='unspecified')
shirt_blouse_size = db.Column(db.String(20), default='unspecified')
jacket_sweater_size = db.Column(db.String(20), default='unspecified')
shoe_sock_size = db.Column(db.String(20), default='unspecified')
shoe_size_category = db.Column(db.String(20), default='unspecified')
shoe_type = db.Column(db.String(50), default='unspecified')
sheet_size = db.Column(db.String(20), default='unspecified')
additional_comments = db.Column(db.Text(), nullable=True, default=None)
time_chosen = db.Column(db.String(40), nullable=True, default=None)
sponsor_id = db.Column(db.Integer, db.ForeignKey(
'user.id'), nullable=True, default=None)
drive_id = db.Column(db.Integer, db.ForeignKey(
'holiday_cheer_drive.id'), nullable=False, default=None)
通过继承 SearchableMixin class 使该模型可搜索,如下所示:
class SearchableMixin(object):
@classmethod
def search_object(cls, fields, expression, page, per_page, fuzziness=0):
ids, total = query_object(
cls.__tablename__, fields, expression, page, per_page, fuzziness=fuzziness)
if total == 0:
return cls.query.filter_by(id=0), 0
when = []
for i in range(len(ids)):
when.append((ids[i], i))
return cls.query.filter(cls.id.in_(ids)).order_by(
db.case(when, value=cls.id)), total
当我当前搜索时,所有字段都是可搜索的并且 return 一个有效的结果,除非我使用数字值进行搜索。
这是一个搜索输出示例,当我告诉 python 将值打印到控制台时,该搜索有效:
Query: bob
Body of search:
{'from': 0,
'query': {'multi_match': {'fields': ['first_name',
'gender',
'wants',
'needs',
'wear',
'read',
'shoe_size_category',
'shoe_type',
'sheet_size',
'additional_comments',
'time_chosen',
'age',
'shoe_sock_size',
'program_number'],
'fuzziness': 0,
'lenient': True,
'query': 'bob'}},
'size': 10}
Python elasticsearch object:
{'took': 27, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': 1.6916759, 'hits': [{'_index': 'wish_list', '_type': '_doc', '_id': '1', '_score': 1.6916759, '_source': {'first_name': 'bob', 'gender': 'male', 'wants': 'bike', 'needs': 'calculator', 'wear': 'hat', 'read': 'book', 'shoe_size_category': "men's", 'shoe_type': 'sneaker', 'sheet_size': 'unspecified', 'additional_comments': 'Likes cheese', 'time_chosen': None, 'age': '5', 'shoe_sock_size': '4', 'program_number': '215', 'mappings': {'properties': {'first_name': {'type': 'text'}, 'gender': {'type':
'text'}, 'wants': {'type': 'text'}, 'needs': {'type': 'text'}, 'wear': {'type': 'text'}, 'read': {'type': 'text'}, 'shoe_size_category': {'type': 'text'}, 'shoe_type': {'type': 'text'}, 'sheet_size': {'type': 'text'}, 'additional_comments': {'type': 'text'}, 'time_chosen': {'type': 'text'}, 'age': {'type': 'text'}, 'shoe_sock_size': {'type': 'text'}, 'program_number': {'type': 'text'}}}}}, {'_index': 'wish_list', '_type': '_doc', '_id': '9', '_score': 1.6916759, '_source': {'first_name': 'bob', 'gender': 'male', 'wants': 'bike', 'needs': 'calculator', 'wear': 'hat', 'read': 'book', 'shoe_size_category': "men's", 'shoe_type': 'sneaker', 'sheet_size': 'unspecified', 'additional_comments': 'Likes cheese', 'time_chosen': None, 'age': 5, 'shoe_sock_size': 4, 'program_number': 215, 'mappings': {'properties': {'first_name': {'type': 'text'}, 'gender': {'type': 'text'}, 'wants': {'type': 'text'}, 'needs': {'type': 'text'}, 'wear': {'type': 'text'}, 'read': {'type': 'text'}, 'shoe_size_category': {'type': 'text'}, 'shoe_type': {'type': 'text'}, 'sheet_size': {'type': 'text'}, 'additional_comments': {'type': 'text'}, 'time_chosen': {'type': 'text'}, 'age': {'type': 'text'}, 'shoe_sock_size': {'type': 'text'}, 'program_number': {'type': 'text'}}}}}]}}
下面是针对同一对象的完全相同的查询,但使用的是数字字符串:
Query: 215
Body of search:
{'from': 0,
'query': {'multi_match': {'fields': ['first_name',
'gender',
'wants',
'needs',
'wear',
'read',
'shoe_size_category',
'shoe_type',
'sheet_size',
'additional_comments',
'time_chosen',
'age',
'shoe_sock_size',
'program_number'],
'fuzziness': 0,
'lenient': True,
'query': '215'}},
'size': 10}
Python elasticsearch object:
{'took': 18, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}
正在向函数中传递一个字符串,数据全部保存为字符串,但似乎存在某种类型错误。在我添加 lenient: True
之前,它抛出了一个错误,说 elasticsearch 无法构建查询。
如果我能理解我将如何使用 elasticsearch REST API,那么我可能会弄清楚如何使用 python。
问题是由于在 numeric
数据类型上使用了 fuzziness
参数,然后使用 lenient
true 使其工作 removes format-based errors, such as providing a text query value for a numeric field, are ignored.
在 this link.
以下是您在尝试对数字数据类型使用 fuzziness
时遇到的错误。
reason": "Can only use fuzzy queries on keyword and text fields - not on [age] which is of type [integer]"
当您添加 "lenient" : true
时,会出现上述错误,但不会 return 任何文档。
要让它工作,只需从搜索查询中删除 fuzziness
和 lenient
参数,它应该会工作,因为 Elasticsearch 会自动将有效的 string
转换为 numeric
和反之亦然,如 coerce 文章所述。
使用 REST 展示它的工作示例API
索引定义
{
"mappings": {
"properties": {
"age" :{
"type" : "integer"
}
}
}
}
索引示例文档
{
"age" : "25" --> note use of `""`, sending it as string
}
{
"age" : 28 :- note sending numneric value
}
字符串格式的搜索查询
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "28", --> note string format
"fields": [
"age" --> note you can add more fields
]
}
}
]
}
}
}
搜索结果
"hits": [
{
"_index": "so_numberic",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"program_number": "123456789",
"age": "28"
}
}
]
以数字格式搜索查询
{
"query": {
"match" : { --> query on single field.
"age" : {
"query" : 28 --> note numeric format
}
}
}
}
结果
"hits": [
{
"_index": "so_numberic",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"program_number": "123456789",
"age": "28"
}
}
]
显示您的 fuzziness
和 lenient
并没有像之前解释的那样带来任何结果。
搜索查询
{
"query": {
"match": {
"age": {
"query": 28,
"fuzziness": 2,
"lenient": true
}
}
}
}
结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": { --> note 0 results.
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}