Solr 中的双向同义词

Synonyms in Solr two-ways

对于给定的同义词列表,我希望 Solr 4 以两种方式 return 同义词。

索引内容

Nice villa front of the sea
Looking for condo around 2 billions $
Superb house with 3 bedrooms
Flat for sale

synonyms.txt

#Equivalent synonyms may be separated with commas and give
#no explicit mapping.  In this case the mapping behavior will
#be taken from the expand parameter in the schema.  This allows
#the same synonym file to be used in different synonym handling strategies.

villa, house, home, condo, appartement, residence, flat 

schema.xml

<analyzer type="index">
  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.EnglishPossessiveFilterFactory"/>
  <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="false"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.PorterStemFilterFactory"/>
</analyzer>

<analyzer type="query">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.EnglishPossessiveFilterFactory"/>
  <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
  <filter class="solr.PorterStemFilterFactory"/>
</analyzer>

实际上:

搜索 "villa" return 的所有结果,但如果我在 Synonyms.txt 中搜索任何其他词,除了相应的句子,我什么也得不到。

平 returns:

Flat for sale

即房子returns:

Superb house with 3 bedrooms

我想要所有同义词(公寓、房屋、公寓等)return 与 "villa" 关键字相同。

您的分析器配置存在一些问题。

  • "The synonym filter should be after the tokenizer" 正如 MatsLindh 指出的
  • LowerCaseFilter 应该在 SynonymFilter 之前。否则,在您的示例中,Flat 将被忽略,因为它并非全部为小写。
  • PhoneticFilter 应该将 inject 设置为 true,因为您现在只提取语音标记而不是原始标记。

试试这个配置:

<analyzer type="index">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.EnglishPossessiveFilterFactory"/>
  <filter class="solr.SynonymFilterFactory"
          synonyms="synonyms.txt" ignoreCase="true" expand="true" />
  <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
  <filter class="solr.PorterStemFilterFactory"/>
</analyzer>