如何使用短语作为 termfreq 术语参数

Question

我想使用 termfreq 来提供在字段中匹配短语的频率。看了很多帖子，我为目标字段设置了相关的字段类型如下：

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigramsIfNoShingles="true"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigramsIfNoShingles="true"/>
  </analyzer>
</fieldType>

字段是：

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true" omitTermFreqAndPositions="false" termVectors="true" termPositions="true" termOffsets="true"/>

作为测试，我有一个文本字段，其正文中包含短语 "test document"。鉴于此设置，函数 termfreq(text,"test document") 正确 returns 1。但是，如果我改为调用 termfreq(text,"document test")，它 returns 0，甚至尽管当我使用文本查询时："document test"，它报告了文档的命中（这正是我想要的）。

所以我对这应该如何工作感到困惑。我还想将邻近运算符作为 termfreq 项的一部分（类似于 termfreq(text,"test document"~4)），但我也找不到让它工作的方法。

Answer 1

当您运行查询时，您传入的字符串就是一个查询。调用 termfreq 函数时，您将传入 term，而不是查询。

针对查询语法分析查询，并将（通常）对其进行分析。一个学期都不会发生。术语本质上是索引文本的原子单位，因此它将精确地查找您在索引中传入的术语。

因此，对于您的查询 text:"document test"，您将搜索三个字词 document test、document 和 test。虽然在索引中找不到 document test，但其他两个在索引中，因此您有匹配项。对于 termfreq 调用，您专门要求它获取单个术语 document test 的频率，即 0.

如何使用短语作为 termfreq 术语参数

How to use phrase as termfreq term argument

lucene

solr