如何构建 Elasticsearch 以仅过滤 URL 子域?
How to construct Elasticsearch to filter only URL with subdomain?
我将 URL 存储为 Elasticsearch 中的一个字段。但是,我只想过滤在 url
中具有子域的文档。
例如
我希望我的搜索结果有
http://any-subdomain.example.com
但我不希望结果有
https://www.example.com
这在 Elasticsearch 查询中可行吗?
您尝试过 query_string
查询吗?例如,我用于 twitter 数据如下:
GET /twitter2/tweet/_search
{
"query": {
"query_string": {
"default_field": "entities.media.url",
"query": "https\:\/\/t.co\/* AND -https\:\/\/t.co\/6*"
}
},
"_source": ["entities.media.url"]
}
为此搜索我的映射:
PUT /twitter2/tweet/_mapping
{
"properties": {
"entities": {
"properties": {
"media": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
您可以针对您的案例使用以下查询:
GET /your-index/your-type/_search
{
"query": {
"query_string": {
"default_field": "url",
"query": "http\:\/\/*.example.com AND -http\:\/\/www.example.com"
}
}
}
Note : you should know that you can get your result faster if you use something to handle while indexing your data as url
and host
. With elastic 5.x, you can use ingest node to manipulate your data like this. I will try to create a pipeline for this but you can check the doc for more information
我将 URL 存储为 Elasticsearch 中的一个字段。但是,我只想过滤在 url
中具有子域的文档。
例如
我希望我的搜索结果有
http://any-subdomain.example.com
但我不希望结果有
https://www.example.com
这在 Elasticsearch 查询中可行吗?
您尝试过 query_string
查询吗?例如,我用于 twitter 数据如下:
GET /twitter2/tweet/_search
{
"query": {
"query_string": {
"default_field": "entities.media.url",
"query": "https\:\/\/t.co\/* AND -https\:\/\/t.co\/6*"
}
},
"_source": ["entities.media.url"]
}
为此搜索我的映射:
PUT /twitter2/tweet/_mapping
{
"properties": {
"entities": {
"properties": {
"media": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
您可以针对您的案例使用以下查询:
GET /your-index/your-type/_search
{
"query": {
"query_string": {
"default_field": "url",
"query": "http\:\/\/*.example.com AND -http\:\/\/www.example.com"
}
}
}
Note : you should know that you can get your result faster if you use something to handle while indexing your data as
url
andhost
. With elastic 5.x, you can use ingest node to manipulate your data like this. I will try to create a pipeline for this but you can check the doc for more information