Solr 中的双向同义词
Synonyms in Solr two-ways
对于给定的同义词列表,我希望 Solr 4 以两种方式 return 同义词。
索引内容
Nice villa front of the sea
Looking for condo around 2 billions $
Superb house with 3 bedrooms
Flat for sale
synonyms.txt
#Equivalent synonyms may be separated with commas and give
#no explicit mapping. In this case the mapping behavior will
#be taken from the expand parameter in the schema. This allows
#the same synonym file to be used in different synonym handling strategies.
villa, house, home, condo, appartement, residence, flat
schema.xml
<analyzer type="index">
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="false"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
实际上:
搜索 "villa" return 的所有结果,但如果我在 Synonyms.txt 中搜索任何其他词,除了相应的句子,我什么也得不到。
平 returns:
Flat for sale
即房子returns:
Superb house with 3 bedrooms
我想要所有同义词(公寓、房屋、公寓等)return 与 "villa" 关键字相同。
您的分析器配置存在一些问题。
- "The synonym filter should be after the tokenizer" 正如 MatsLindh 指出的
LowerCaseFilter
应该在 SynonymFilter
之前。否则,在您的示例中,Flat 将被忽略,因为它并非全部为小写。
PhoneticFilter
应该将 inject
设置为 true
,因为您现在只提取语音标记而不是原始标记。
试试这个配置:
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
对于给定的同义词列表,我希望 Solr 4 以两种方式 return 同义词。
索引内容
Nice villa front of the sea
Looking for condo around 2 billions $
Superb house with 3 bedrooms
Flat for sale
synonyms.txt
#Equivalent synonyms may be separated with commas and give
#no explicit mapping. In this case the mapping behavior will
#be taken from the expand parameter in the schema. This allows
#the same synonym file to be used in different synonym handling strategies.
villa, house, home, condo, appartement, residence, flat
schema.xml
<analyzer type="index">
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="false"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
实际上:
搜索 "villa" return 的所有结果,但如果我在 Synonyms.txt 中搜索任何其他词,除了相应的句子,我什么也得不到。
平 returns:
Flat for sale
即房子returns:
Superb house with 3 bedrooms
我想要所有同义词(公寓、房屋、公寓等)return 与 "villa" 关键字相同。
您的分析器配置存在一些问题。
- "The synonym filter should be after the tokenizer" 正如 MatsLindh 指出的
LowerCaseFilter
应该在SynonymFilter
之前。否则,在您的示例中,Flat 将被忽略,因为它并非全部为小写。PhoneticFilter
应该将inject
设置为true
,因为您现在只提取语音标记而不是原始标记。
试试这个配置:
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>