如何搜索可能包含空格的字段,- 和一个连接的数字。?
How to search the field which could contains spaces,- and a concatenated number.?
您好,我有一个具有以下架构的字段,
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
我正在存储完整的 pdf 文档。
现在假设我有 4 个文档,内容如下。
1. Whosebug is a good site.
2. stack-overflow is a good site.
3. stack overflow is a good site.
4. Whosebug2018 is a good site.
现在当我搜索 Whosebug
它应该 return 我 1,
当我搜索 stack-overflow
时,它应该 return 我 2.
当我搜索 stack overflow
时,它应该 return 我 3.
当我搜索 Whosebug2018
时,它应该 return 我 4.
它的模式应该是什么模式在这种情况下不起作用。
我可以在查询中指定什么吗?
A Word Delimiter Graph Filter 将按非字母数字 (-
)、大小写更改和默认数字拆分。
The rules for determining delimiters are determined as follows:
A change in case within a word: "CamelCase" -> "Camel", "Case". This
can be disabled by setting splitOnCaseChange="0".
A transition from alpha to numeric characters or vice versa:
"Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be
disabled by setting splitOnNumerics="0".
Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"
A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"
Any leading or trailing delimiters are discarded: "--hot-spot--" ->
"hot", "spot"
如果您不想要该行为,请从过滤器列表中删除 WordDelimiterFilter 并添加其他过滤器以支持您需要的 WDF 行为部分。
您好,我有一个具有以下架构的字段,
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
我正在存储完整的 pdf 文档。
现在假设我有 4 个文档,内容如下。
1. Whosebug is a good site.
2. stack-overflow is a good site.
3. stack overflow is a good site.
4. Whosebug2018 is a good site.
现在当我搜索 Whosebug
它应该 return 我 1,
当我搜索 stack-overflow
时,它应该 return 我 2.
当我搜索 stack overflow
时,它应该 return 我 3.
当我搜索 Whosebug2018
时,它应该 return 我 4.
它的模式应该是什么模式在这种情况下不起作用。 我可以在查询中指定什么吗?
A Word Delimiter Graph Filter 将按非字母数字 (-
)、大小写更改和默认数字拆分。
The rules for determining delimiters are determined as follows:
A change in case within a word: "CamelCase" -> "Camel", "Case". This can be disabled by setting splitOnCaseChange="0".
A transition from alpha to numeric characters or vice versa: "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be disabled by setting splitOnNumerics="0".
Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"
A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"
Any leading or trailing delimiters are discarded: "--hot-spot--" -> "hot", "spot"
如果您不想要该行为,请从过滤器列表中删除 WordDelimiterFilter 并添加其他过滤器以支持您需要的 WDF 行为部分。