使用 Solr hon-lucene-synonyms 插件搜索其他字段?
Searching other fields with Solr hon-lucene-synonyms plugin?
我正在研究如何使用这个插件:
https://github.com/healthonnet/hon-lucene-synonyms
如果我运行:
它如我所愿地工作,我得到调试告诉我它正在做我想做的事:
<arr name="expandedSynonyms">
<str>art</str>
<str>cartoon</str>
<str>clip</str>
<str>clipart</str>
<str>graphics</str>
<str>image</str>
<str>images</str>
<str>multimedia</str>
<str>picture</str>
<str>pictures</str>
<str>royalty free</str>
</arr>
+(((_text_:royalty) (_text_:free))^1.0 ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:cartoons _text_:clip _text_:clipart _text_:comic _text_:draw _text_:drawing _text_:drawings _text_:funny _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty _text_:sketch) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0))
当我还想使用其他过滤器进一步缩小结果范围时,问题就来了:
http://solr.example.com/solr/graphics/select/?q=({!lucene%20sow=false%20df=title}%20royalty+free)%20AND%20(has_fla:1)&&debugQuery=on&defType=synonym_edismax&synonyms=true
我刚刚得到这个调试:
<lst name="reasonForNotExpandingSynonyms">
<str name="name">HasComplexQueryOperators</str>
<str name="explanation">
synonyms.ignoreQueryOperators is set to false, and this query contains complex query operators (e.g. AND, OR, *, -, etc.). Complex queries aren't supported.
</str>
</lst>
当然必须有一种方法可以让同义词的东西保持工作,同时也可以在其他领域进行搜索?我正在使用 Solr 6.6.0
QueryParser
是 solrconfig.xml 看起来像:
<queryParser name="synonym_edismax" class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
<!-- You can define more than one synonym analyzer in the following list.
For example, you might have one set of synonyms for English, one for French,
one for Spanish, etc.
-->
<lst name="synonymAnalyzers">
<!-- Name your analyzer something useful, e.g. "analyzer_en", "analyzer_fr", "analyzer_es", etc.
If you only have one, the name doesn't matter (hence "myCoolAnalyzer").
-->
<lst name="myCoolAnalyzer">
<!-- We recommend a PatternTokenizerFactory that tokenizes based on whitespace and quotes.
This seems to work best with most people's synonym files.
For details, read the discussion here: http://github.com/healthonnet/hon-lucene-synonyms/issues/26
-->
<lst name="tokenizer">
<str name="class">solr.PatternTokenizerFactory</str>
<str name="pattern"><![CDATA[(?:\s|\")+]]></str>
</lst>
<!-- The ShingleFilterFactory outputs synonyms of multiple token lengths (e.g. unigrams, bigrams, trigrams, etc.).
The default here is to assume you don't have any synonyms longer than 4 tokens.
You can tweak this depending on what your synonyms look like. E.g. if you only have unigrams, you can remove
it entirely, and if your synonyms are up to 7 tokens in length, you should set the maxShingleSize to 7.
-->
<lst name="filter">
<str name="class">solr.ShingleFilterFactory</str>
<str name="outputUnigramsIfNoShingles">true</str>
<str name="outputUnigrams">true</str>
<str name="minShingleSize">2</str>
<str name="maxShingleSize">4</str>
</lst>
<!-- This is where you set your synonym file. For the unit tests and "Getting Started" examples, we use example_synonym_file.txt.
This plugin will work best if you keep expand set to true and have all your synonyms comma-separated (rather than =>-separated).
-->
<lst name="filter">
<str name="class">solr.SynonymFilterFactory</str>
<str name="tokenizerFactory">solr.KeywordTokenizerFactory</str>
<str name="synonyms">synonyms.txt</str>
<str name="expand">true</str>
<str name="ignoreCase">true</str>
</lst>
</lst>
</lst>
</queryParser>
物有所值 - 我们使用此插件的原因是因为我们想使用多个单词同义词 - 例如:
royalty free, cartoon, images, photos
使用标准设置 "royalty" 和 "free" 并在同义词中被视为 2 个单独的词,这是我们不想要的。
谢谢
一些评论:
通过 'has_fla:1' 过滤肯定会起作用,如果您只是将其添加为过滤器:
&fq=has_fla:1
不仅会用,而且是首选。过滤器不影响评分,将被缓存以供以后查询。
该插件似乎只有 Solr 5.3.1 的版本,尝试将其与 6.6 一起使用可能会出现问题。您确定不能在纯 Solr 中使用 'sow' 和 'SynonymGraphFilter' 的多个单词同义词吗?检查 this
我正在研究如何使用这个插件:
https://github.com/healthonnet/hon-lucene-synonyms
如果我运行:
它如我所愿地工作,我得到调试告诉我它正在做我想做的事:
<arr name="expandedSynonyms">
<str>art</str>
<str>cartoon</str>
<str>clip</str>
<str>clipart</str>
<str>graphics</str>
<str>image</str>
<str>images</str>
<str>multimedia</str>
<str>picture</str>
<str>pictures</str>
<str>royalty free</str>
</arr>
+(((_text_:royalty) (_text_:free))^1.0 ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:cartoons _text_:clip _text_:clipart _text_:comic _text_:draw _text_:drawing _text_:drawings _text_:funny _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty _text_:sketch) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0) ((+((Synonym(_text_:art _text_:cartoon _text_:clip _text_:clipart _text_:graphics _text_:image _text_:images _text_:multimedia _text_:picture _text_:pictures _text_:royalty) _text_:free)))^1.0))
当我还想使用其他过滤器进一步缩小结果范围时,问题就来了:
http://solr.example.com/solr/graphics/select/?q=({!lucene%20sow=false%20df=title}%20royalty+free)%20AND%20(has_fla:1)&&debugQuery=on&defType=synonym_edismax&synonyms=true
我刚刚得到这个调试:
<lst name="reasonForNotExpandingSynonyms">
<str name="name">HasComplexQueryOperators</str>
<str name="explanation">
synonyms.ignoreQueryOperators is set to false, and this query contains complex query operators (e.g. AND, OR, *, -, etc.). Complex queries aren't supported.
</str>
</lst>
当然必须有一种方法可以让同义词的东西保持工作,同时也可以在其他领域进行搜索?我正在使用 Solr 6.6.0
QueryParser
是 solrconfig.xml 看起来像:
<queryParser name="synonym_edismax" class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
<!-- You can define more than one synonym analyzer in the following list.
For example, you might have one set of synonyms for English, one for French,
one for Spanish, etc.
-->
<lst name="synonymAnalyzers">
<!-- Name your analyzer something useful, e.g. "analyzer_en", "analyzer_fr", "analyzer_es", etc.
If you only have one, the name doesn't matter (hence "myCoolAnalyzer").
-->
<lst name="myCoolAnalyzer">
<!-- We recommend a PatternTokenizerFactory that tokenizes based on whitespace and quotes.
This seems to work best with most people's synonym files.
For details, read the discussion here: http://github.com/healthonnet/hon-lucene-synonyms/issues/26
-->
<lst name="tokenizer">
<str name="class">solr.PatternTokenizerFactory</str>
<str name="pattern"><![CDATA[(?:\s|\")+]]></str>
</lst>
<!-- The ShingleFilterFactory outputs synonyms of multiple token lengths (e.g. unigrams, bigrams, trigrams, etc.).
The default here is to assume you don't have any synonyms longer than 4 tokens.
You can tweak this depending on what your synonyms look like. E.g. if you only have unigrams, you can remove
it entirely, and if your synonyms are up to 7 tokens in length, you should set the maxShingleSize to 7.
-->
<lst name="filter">
<str name="class">solr.ShingleFilterFactory</str>
<str name="outputUnigramsIfNoShingles">true</str>
<str name="outputUnigrams">true</str>
<str name="minShingleSize">2</str>
<str name="maxShingleSize">4</str>
</lst>
<!-- This is where you set your synonym file. For the unit tests and "Getting Started" examples, we use example_synonym_file.txt.
This plugin will work best if you keep expand set to true and have all your synonyms comma-separated (rather than =>-separated).
-->
<lst name="filter">
<str name="class">solr.SynonymFilterFactory</str>
<str name="tokenizerFactory">solr.KeywordTokenizerFactory</str>
<str name="synonyms">synonyms.txt</str>
<str name="expand">true</str>
<str name="ignoreCase">true</str>
</lst>
</lst>
</lst>
</queryParser>
物有所值 - 我们使用此插件的原因是因为我们想使用多个单词同义词 - 例如:
royalty free, cartoon, images, photos
使用标准设置 "royalty" 和 "free" 并在同义词中被视为 2 个单独的词,这是我们不想要的。
谢谢
一些评论:
通过 'has_fla:1' 过滤肯定会起作用,如果您只是将其添加为过滤器:
&fq=has_fla:1
不仅会用,而且是首选。过滤器不影响评分,将被缓存以供以后查询。
该插件似乎只有 Solr 5.3.1 的版本,尝试将其与 6.6 一起使用可能会出现问题。您确定不能在纯 Solr 中使用 'sow' 和 'SynonymGraphFilter' 的多个单词同义词吗?检查 this