如何在 AWS Cloudsearch 中执行包含符号的结构化查询
How to execute a structured query containing symbols in AWS Cloudsearch
我正在尝试在 Cloudsearch 中执行结构化前缀查询。
这是查询参数的片段(csattribute
是文本类型)
{
"query": "(prefix field=csattribute '12-3')",
"queryParser": "structured",
"size": 5
}
我的上述查询将导致 No matches for "(prefix field=csattribute '12-3')"
。
但是,如果我将查询更改为
{
"query": "(prefix field=csattribute '12')",
"queryParser": "structured",
"size": 5
}
然后我会得到一个我期望的结果列表。
我在简短的谷歌搜索中没有发现太多。如何在查询中包含 -
?是否需要转义?还有其他需要转义的字符吗?
我通过这个 SO 问题找到了正确的方向:
下面是来自 https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html
的片段
Text Processing in Amazon CloudSearch ... During tokenization, the
stream of text in a field is split into separate tokens on detectable
boundaries using the word break rules defined in the Unicode Text
Segmentation algorithm.
According to the word break rules, strings separated by whitespace
such as spaces and tabs are treated as separate tokens. In many cases,
punctuation is dropped and treated as whitespace. For example, strings
are split at hyphens (-) and the at symbol (@). However, periods that
are not followed by whitespace are considered part of the token.
据我了解,text
和 text-array
字段是根据分析方案标记化的(在我的例子中是英文)。文本已标记化,-
符号是分词标记。
此字段不需要标记化。将索引类型更新为 literal
会阻止字段上的所有标记化,这允许我的问题中的查询达到 return 预期结果。
我正在尝试在 Cloudsearch 中执行结构化前缀查询。
这是查询参数的片段(csattribute
是文本类型)
{
"query": "(prefix field=csattribute '12-3')",
"queryParser": "structured",
"size": 5
}
我的上述查询将导致 No matches for "(prefix field=csattribute '12-3')"
。
但是,如果我将查询更改为
{
"query": "(prefix field=csattribute '12')",
"queryParser": "structured",
"size": 5
}
然后我会得到一个我期望的结果列表。
我在简短的谷歌搜索中没有发现太多。如何在查询中包含 -
?是否需要转义?还有其他需要转义的字符吗?
我通过这个 SO 问题找到了正确的方向:
下面是来自 https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html
的片段Text Processing in Amazon CloudSearch ... During tokenization, the stream of text in a field is split into separate tokens on detectable boundaries using the word break rules defined in the Unicode Text Segmentation algorithm.
According to the word break rules, strings separated by whitespace such as spaces and tabs are treated as separate tokens. In many cases, punctuation is dropped and treated as whitespace. For example, strings are split at hyphens (-) and the at symbol (@). However, periods that are not followed by whitespace are considered part of the token.
据我了解,text
和 text-array
字段是根据分析方案标记化的(在我的例子中是英文)。文本已标记化,-
符号是分词标记。
此字段不需要标记化。将索引类型更新为 literal
会阻止字段上的所有标记化,这允许我的问题中的查询达到 return 预期结果。