弹性搜索中的LM

LM in elastic search

我怎样才能提高这种情况的召回率?有什么建议吗? 我想创建一个包含 3900 万个段落的索引,每个段落至少包含四个英文句子。我的查询是简短的疑问句。我知道具有 Dirichlet 平滑、停用词删除和词干分析器的语言模型最适合这种情况。我怎样才能用这些条件建立索引(我已经用这个配置建立了索引,但结果与默认 bm25 没有区别)

我的索引:

{
"settings": {
"index":{
            "similarity" : {
          "my_similarity" : {
            "type" : "LMDirichlet",
            "mu" : 2000
          }
        },
  "analysis":{
    "filter":{
      "english_stop":{
        "type":"stop",
        "stopwords":"_english_"
      },
      "my_stemmer":{
        "type":"stemmer",
        "name":"english"
      }
    },
    "analyzer":{
      "my_custom_analyzer":{
        "type":"custom",
        "tokenizer":"standard",
        "filter":[
          "lowercase",
          "english_stop",
          "my_stemmer"
          ]
      }
    }
  }
},
    "number_of_shards": 1
},
"mappings": {
    "properties": {
        "content": {
        "similarity" : "my_similarity" ,
        "analyzer": "my_custom_analyzer",
            "type": "text"
        }
    }
}
}

搜索我的 python 代码是:

query = " (" + prevTurn + ")^1 (" + currentTurn + ")^2"

search_param={
"query": {
"query_string": {
"query":query,
"analyzer": "my_stop_analyzer",
"default_field":"doc.content"
}
}
}

一个样本回合:

Title: The Neolithic Revolution
Description: The neolithic revolution and technology used within it and when it emerged in the british isles.  Also, the transition to the bronze age and its significance.
1   What was the neolithic revolution?
2   When did it start and end?
3   Why did it start?
4   What did the neolithic invent?
5   What tools were used?
6   When was it brought to the british isles?

你可以在查询中尝试相似度