Elasticsearch 改变相似性不起作用
Elasticsearch changing similarity does not work
更改我的索引的相似度算法不起作用。我不想比较 BM25 和 TF-IDF,但我总是得到相同的结果。我正在使用 Elasticsearch 5.x.
我几乎什么都试过了。将 属性 的相似度设置为 classic
或 BM25
或者什么都不设置
"properties": {
"content": {
"type": "text",
"similarity": "classic"
},
我还尝试在 settings
中设置索引的默认相似度并在 properties
中使用它
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "test",
"similarity": {
"default": {
"type": "classic"
}
},
"creation_date": "1493748517301",
"number_of_replicas": "1",
"uuid": "sNuWcT4AT82MKsfAB9JcXQ",
"version": {
"created": "5020299"
}
}
我正在测试的查询看起来像这样:
{
"query": {
"match": {
"content": "some search query"
}
}
}
我在下面创建了一个示例:
DELETE test
PUT test
{
"mappings": {
"book": {
"properties": {
"content": {
"type": "text",
"similarity": "BM25"
},
"subject": {
"type": "text",
"similarity": "classic"
}
}
}
}
}
POST test/book/1
{
"subject": "A neutron star is the collapsed core of a large (10–29 solar masses) star. Neutron stars are the smallest and densest stars known to exist.[1] Though neutron stars typically have a radius on the order of 10 km, they can have masses of about twice that of the Sun.",
"content": "A neutron star is the collapsed core of a large (10–29 solar masses) star. Neutron stars are the smallest and densest stars known to exist.[1] Though neutron stars typically have a radius on the order of 10 km, they can have masses of about twice that of the Sun."
}
POST test/book/2
{
"subject": "A quark star is a hypothetical type of compact exotic star composed of quark matter, where extremely high temperature and pressure forces nuclear particles to dissolve into a continuous phase consisting of free quarks. These are ultra-dense phases of degenerate matter theorized to form inside neutron stars exceeding a predicted internal pressure needed for quark degeneracy.",
"content": "A quark star is a hypothetical type of compact exotic star composed of quark matter, where extremely high temperature and pressure forces nuclear particles to dissolve into a continuous phase consisting of free quarks. These are ultra-dense phases of degenerate matter theorized to form inside neutron stars exceeding a predicted internal pressure needed for quark degeneracy."
}
GET test/_search?explain
{
"query": {
"match": {
"subject": "neutron"
}
}
}
GET test/_search?explain
{
"query": {
"match": {
"content": "neutron"
}
}
}
subject
和 content
字段具有不同的相似性定义,但在我提供的两个文档(来自维基百科)中,它们具有相同的文本。 运行 您将在解释中看到类似这样的两个查询,并且在结果中也会得到不同的分数:
- 来自第一个查询:
"description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:"
- 来自第二个:
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
更改我的索引的相似度算法不起作用。我不想比较 BM25 和 TF-IDF,但我总是得到相同的结果。我正在使用 Elasticsearch 5.x.
我几乎什么都试过了。将 属性 的相似度设置为 classic
或 BM25
或者什么都不设置
"properties": {
"content": {
"type": "text",
"similarity": "classic"
},
我还尝试在 settings
中设置索引的默认相似度并在 properties
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "test",
"similarity": {
"default": {
"type": "classic"
}
},
"creation_date": "1493748517301",
"number_of_replicas": "1",
"uuid": "sNuWcT4AT82MKsfAB9JcXQ",
"version": {
"created": "5020299"
}
}
我正在测试的查询看起来像这样:
{
"query": {
"match": {
"content": "some search query"
}
}
}
我在下面创建了一个示例:
DELETE test
PUT test
{
"mappings": {
"book": {
"properties": {
"content": {
"type": "text",
"similarity": "BM25"
},
"subject": {
"type": "text",
"similarity": "classic"
}
}
}
}
}
POST test/book/1
{
"subject": "A neutron star is the collapsed core of a large (10–29 solar masses) star. Neutron stars are the smallest and densest stars known to exist.[1] Though neutron stars typically have a radius on the order of 10 km, they can have masses of about twice that of the Sun.",
"content": "A neutron star is the collapsed core of a large (10–29 solar masses) star. Neutron stars are the smallest and densest stars known to exist.[1] Though neutron stars typically have a radius on the order of 10 km, they can have masses of about twice that of the Sun."
}
POST test/book/2
{
"subject": "A quark star is a hypothetical type of compact exotic star composed of quark matter, where extremely high temperature and pressure forces nuclear particles to dissolve into a continuous phase consisting of free quarks. These are ultra-dense phases of degenerate matter theorized to form inside neutron stars exceeding a predicted internal pressure needed for quark degeneracy.",
"content": "A quark star is a hypothetical type of compact exotic star composed of quark matter, where extremely high temperature and pressure forces nuclear particles to dissolve into a continuous phase consisting of free quarks. These are ultra-dense phases of degenerate matter theorized to form inside neutron stars exceeding a predicted internal pressure needed for quark degeneracy."
}
GET test/_search?explain
{
"query": {
"match": {
"subject": "neutron"
}
}
}
GET test/_search?explain
{
"query": {
"match": {
"content": "neutron"
}
}
}
subject
和 content
字段具有不同的相似性定义,但在我提供的两个文档(来自维基百科)中,它们具有相同的文本。 运行 您将在解释中看到类似这样的两个查询,并且在结果中也会得到不同的分数:
- 来自第一个查询:
"description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:"
- 来自第二个:
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",