标点符号和近查询
Punctuation and Near Query
当我在 cts:word-query
中打开 punctuation-insensitive
时,即使这样 NEAR
查询也会将 -
单词分成两个单词
let $xml :=
<abstracts count="1">
<abstract>
<abstract_text count="1">
<p>We assessed the impact of a pharmacotherapy follow-up programme on key safety points [adverse events (AE)
and drug administration] in outpatients treated with oral antineoplastic agents (OAA). We performed a comparative,
interventional, quasi-experimental study of outpatients treated with OAA in a Spanish hospital to compare pre-intervention
group patients (not monitored by pharmacists during 2011) with intervention group patients (prospectively monitored by
pharmacists during 2013). AE data were collected from medical records. Follow-up was 6 months, and 249 patients were
included (pre-intervention, 115; intervention, 134). After the first month, AE were detected in 86.5% of patients
in the pre-intervention group and 80.6% of patients in the intervention group, P = 0.096. During the remaining months,
79.0% patients had at least one AE in the pre-intervention group compared with 78.0% in the intervention group, P = 0.431.
AE were more prevalent with sorafenib and sunitinib. In total, 173 drug interactions were recorded (pre-intervention, 80;
intervention, 93; P = 0.045). Drug interactions were more frequent with erlotinib and gefitinib; food interactions were
more common with sorafenib and pazopanib. Our follow-up of cancer outpatients revealed a reduction in severe AE and major
drug interactions, thus helping health professionals to monitor the safety of OAA.</p>
</abstract_text>
</abstract>
</abstracts>
let $q3 :=
cts:near-query(
(
cts:element-query((xs:QName("abstract_text")),
cts:word-query( ("Controlled", "randomized", "randomised", "clinical", "masked","blind*","multi center", "open label*","compar*", "cross over", "placebo",
"post market","meta analysis","volunteer*", "prospective"
),
("case-insensitive", "punctuation-insensitive", "wildcarded"))
)
,
cts:element-query((xs:QName("abstract_text")),
cts:word-query(("stud*", "trial*" ),
("case-insensitive", "punctuation-insensitive", "wildcarded"))
)
),
3
)
return
cts:highlight($xml,$q3, <b>{$cts:text}</b>)
当我把 NEAR
放到 3
时,它不匹配 comparative
和 study
即使距离是 3
而且我有它punctuation-insensitive
。但是当我将它更改为 4
时它起作用了..
但是当我也改成punctuation-sensitive
时,即使与NEAR
距离3
仍然不匹配。这是为什么?
而且我想在word-query
中实现匹配说placebo-controlled
和placebo controlled
。我认为一旦我打开 punctuation-insensitive
并在我的单词查询中搜索 placebo controlled
就会找到单词的所有组合..但是当相同时,这将如何影响 NEAR
距离在 NEAR
查询中使用 ?
这实际上与解析搜索时的标点符号无关,而是 MarkLogic 如何标记和索引单个单词的位置。默认情况下,MarkLogic 的标记化将带连字符的短语分解为单独的单词。如果您不喜欢默认行为,您可以使用自定义分词器来指示 MarkLogic 应如何为单词编制索引。有一个非常详细的指南,介绍如何使用自定义分词器忽略单词分词中的连字符 available here。
对于您的情况,我不确定我是否会建议您探索使用自定义分词器。可能会产生意想不到的后果,并且它的性能不如使用默认标记化。相反,使您的代码适应默认标记化的工作方式可能更有意义。
让我们看看:comparative, interventional, quasi-experimental study
它将被标记为:
Word | Position
comparative | 0
interventional | 1
quasi | 2
experimental | 3
study | 4
因此,comparative
和study
之间的距离是4。注意quasi-experimental
被标记为两个词。
我不确定我是否理解您在上一段中提出的问题。但我希望这能为您提供足够的信息,以更好地理解默认标记化的行为方式。
当我在 cts:word-query
中打开 punctuation-insensitive
时,即使这样 NEAR
查询也会将 -
单词分成两个单词
let $xml :=
<abstracts count="1">
<abstract>
<abstract_text count="1">
<p>We assessed the impact of a pharmacotherapy follow-up programme on key safety points [adverse events (AE)
and drug administration] in outpatients treated with oral antineoplastic agents (OAA). We performed a comparative,
interventional, quasi-experimental study of outpatients treated with OAA in a Spanish hospital to compare pre-intervention
group patients (not monitored by pharmacists during 2011) with intervention group patients (prospectively monitored by
pharmacists during 2013). AE data were collected from medical records. Follow-up was 6 months, and 249 patients were
included (pre-intervention, 115; intervention, 134). After the first month, AE were detected in 86.5% of patients
in the pre-intervention group and 80.6% of patients in the intervention group, P = 0.096. During the remaining months,
79.0% patients had at least one AE in the pre-intervention group compared with 78.0% in the intervention group, P = 0.431.
AE were more prevalent with sorafenib and sunitinib. In total, 173 drug interactions were recorded (pre-intervention, 80;
intervention, 93; P = 0.045). Drug interactions were more frequent with erlotinib and gefitinib; food interactions were
more common with sorafenib and pazopanib. Our follow-up of cancer outpatients revealed a reduction in severe AE and major
drug interactions, thus helping health professionals to monitor the safety of OAA.</p>
</abstract_text>
</abstract>
</abstracts>
let $q3 :=
cts:near-query(
(
cts:element-query((xs:QName("abstract_text")),
cts:word-query( ("Controlled", "randomized", "randomised", "clinical", "masked","blind*","multi center", "open label*","compar*", "cross over", "placebo",
"post market","meta analysis","volunteer*", "prospective"
),
("case-insensitive", "punctuation-insensitive", "wildcarded"))
)
,
cts:element-query((xs:QName("abstract_text")),
cts:word-query(("stud*", "trial*" ),
("case-insensitive", "punctuation-insensitive", "wildcarded"))
)
),
3
)
return
cts:highlight($xml,$q3, <b>{$cts:text}</b>)
当我把 NEAR
放到 3
时,它不匹配 comparative
和 study
即使距离是 3
而且我有它punctuation-insensitive
。但是当我将它更改为 4
时它起作用了..
但是当我也改成punctuation-sensitive
时,即使与NEAR
距离3
仍然不匹配。这是为什么?
而且我想在word-query
中实现匹配说placebo-controlled
和placebo controlled
。我认为一旦我打开 punctuation-insensitive
并在我的单词查询中搜索 placebo controlled
就会找到单词的所有组合..但是当相同时,这将如何影响 NEAR
距离在 NEAR
查询中使用 ?
这实际上与解析搜索时的标点符号无关,而是 MarkLogic 如何标记和索引单个单词的位置。默认情况下,MarkLogic 的标记化将带连字符的短语分解为单独的单词。如果您不喜欢默认行为,您可以使用自定义分词器来指示 MarkLogic 应如何为单词编制索引。有一个非常详细的指南,介绍如何使用自定义分词器忽略单词分词中的连字符 available here。
对于您的情况,我不确定我是否会建议您探索使用自定义分词器。可能会产生意想不到的后果,并且它的性能不如使用默认标记化。相反,使您的代码适应默认标记化的工作方式可能更有意义。
让我们看看:comparative, interventional, quasi-experimental study
它将被标记为:
Word | Position
comparative | 0
interventional | 1
quasi | 2
experimental | 3
study | 4
因此,comparative
和study
之间的距离是4。注意quasi-experimental
被标记为两个词。
我不确定我是否理解您在上一段中提出的问题。但我希望这能为您提供足够的信息,以更好地理解默认标记化的行为方式。