Solr

Question

我们的索引分析器配置了 solr.StopFilterFactory。所以停用词没有被索引。

我们还使用 solr.StopFilterFactory 配置了查询分析器，因为我们希望从搜索查询词中忽略停用词

<analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>        
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"
            />
  ... ...

在 solrconfig.xml 中，Select SearchHandler 配置为使用 SearchComponent 参数最小匹配 = 100%

<requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
        <str name="echoParams">explicit</str>           
        <int name="rows">10</int>
        <str name="mm">100%</str>
        <str name="q.alt">*:*</str>
    ... ...

这导致我们的一些多词搜索查询 return 没有结果，例如 "rite in the rain"。由于上面的设置 "in" 和 "the" 未编入索引，但最小匹配使它们成为强制性的，即使查询分析器配置为删除停用词也是如此。

是否有任何设置可以让我们实现这一目标？

Answer 1

在这种情况下，您必须考虑查询中涉及的不同 fieldTypes，区分以下 2 类：solr.TextField 使用相同停止过滤器定义的类型，以及其他字段。涉及的不同类型取决于通过 qf 参数请求的查询字段。

如果查询包含停用词并且 2 类字段混合在 qf 中，您将遇到此问题，因为始终会有一个必需的子句试图匹配 "non-stop-filtered" 字段上的停用词（例如，没有停止过滤器的数字字段或文本字段），除非您设置较低的 mm.

mm.autoRelax param for edismax parser is intended to handle this wrong behavior and is available since Solr 6.0 (SOLR-3085) :

If true, the number of clauses required (minimum should match) will automatically be relaxed if a clause is removed (by e.g. stopwords filter) from some but not all qf fields. Use this parameter as a workaround if you experience that queries return zero hits due to uneven stopword removal between the qf fields.

如果您不能使用 mm.autoRelax 并且只要您需要 mm=100%，您将不得不设置 qf，仅使用使用 的文本字段same 停止过滤器（相同的参数和字典）以保证当查询包含停用词时的一致行为。

Solr – 将 SeachHandler 的 SeachComponent 配置为最小匹配 = 100%，并且仍然忽略搜索查询中的停用词

Solr – Configure SeachHandler's SeachComponent with minimum match = 100% and still ignore stop words from search query

stop-words