为什么 Retrieve 和 Rank 在查询集合时会忽略我的索引?
Why does Retrieve and Rank ignore my indexes when querying a collection?
我们在 Retrieve and Rank 中有一个 Solr 集合,其中包含一个名为 document_sub_type 的字段。该字段在 Solr 模式中索引,但没有 字段类型 值(我知道排名器打算使用的字段必须具有 字段类型 的值为 "Watson_text_en";该字段没有)。我们想过滤此 document_sub_type 元数据字段的结果。
如果我发送查询 power systems client reference AND (document_sub_type:"Client Reference*" OR document_sub_type:"Case Study*") 到 R&R 的 /select 端点,我只得到 document_sub_type 值为 [=53= 的文档] 或 "Client Reference Brief",正如预期的那样。但是,如果我向 /fcselect 端点发送相同的查询,则返回的文档有一个 document_sub_type 值,显然可以包含任何值。
我承认我们的排序器没有经过充分训练,但即使我们从查询中省略排序器也会出现这种情况。
为什么 /fcselect 忽略查询的元数据部分?
以下是两个查询的完整响应正文:
来自/select:
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"q": "power systems client reference AND (document_sub_type:\"Client Reference*\" OR document_sub_type:\"Case Study*\")",
"fl": "document_sub_type",
"wt": "json"
}
},
"response": {
"numFound": 89,
"start": 0,
"docs": [
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
}
]
}
}
来自 /fcselect:
{
"responseHeader": {
"status": 0,
"QTime": 65,
"params": {
"q": "power systems client reference AND (document_sub_type:\"Client Reference*\" OR document_sub_type:\"Case Study*\")",
"ranker_id": "c852c8x19-rank-422",
"fl": "document_sub_type",
"wt": "json"
}
},
"response": {
"numFound": 39428,
"start": 0,
"maxScore": 10,
"docs": [
{
"document_sub_type": "Sales guidance"
},
{
"document_sub_type": "Other sales tool or Utility"
},
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "At a Glance"
},
{
"document_sub_type": "Brief or Template for Marketing"
},
{
"document_sub_type": "text/plain"
},
{
"document_sub_type": "Brief or Template for Marketing"
},
{
"document_sub_type": "QRG"
}
]
}
}
/fcselect 端点不支持在查询参数本身中将术语与布尔运算符组合。对于这种类型的操作,您应该能够使用过滤器查询来获得预期的结果。有关详细信息,请参阅此处的文档:https://www.ibm.com/watson/developercloud/doc/retrieve-rank/plugin_query_syntax.shtml#top
我们在 Retrieve and Rank 中有一个 Solr 集合,其中包含一个名为 document_sub_type 的字段。该字段在 Solr 模式中索引,但没有 字段类型 值(我知道排名器打算使用的字段必须具有 字段类型 的值为 "Watson_text_en";该字段没有)。我们想过滤此 document_sub_type 元数据字段的结果。
如果我发送查询 power systems client reference AND (document_sub_type:"Client Reference*" OR document_sub_type:"Case Study*") 到 R&R 的 /select 端点,我只得到 document_sub_type 值为 [=53= 的文档] 或 "Client Reference Brief",正如预期的那样。但是,如果我向 /fcselect 端点发送相同的查询,则返回的文档有一个 document_sub_type 值,显然可以包含任何值。
我承认我们的排序器没有经过充分训练,但即使我们从查询中省略排序器也会出现这种情况。
为什么 /fcselect 忽略查询的元数据部分?
以下是两个查询的完整响应正文:
来自/select:
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"q": "power systems client reference AND (document_sub_type:\"Client Reference*\" OR document_sub_type:\"Case Study*\")",
"fl": "document_sub_type",
"wt": "json"
}
},
"response": {
"numFound": 89,
"start": 0,
"docs": [
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Brief"
}
]
}
}
来自 /fcselect:
{
"responseHeader": {
"status": 0,
"QTime": 65,
"params": {
"q": "power systems client reference AND (document_sub_type:\"Client Reference*\" OR document_sub_type:\"Case Study*\")",
"ranker_id": "c852c8x19-rank-422",
"fl": "document_sub_type",
"wt": "json"
}
},
"response": {
"numFound": 39428,
"start": 0,
"maxScore": 10,
"docs": [
{
"document_sub_type": "Sales guidance"
},
{
"document_sub_type": "Other sales tool or Utility"
},
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "Client Reference Brief"
},
{
"document_sub_type": "Client Reference Book"
},
{
"document_sub_type": "At a Glance"
},
{
"document_sub_type": "Brief or Template for Marketing"
},
{
"document_sub_type": "text/plain"
},
{
"document_sub_type": "Brief or Template for Marketing"
},
{
"document_sub_type": "QRG"
}
]
}
}
/fcselect 端点不支持在查询参数本身中将术语与布尔运算符组合。对于这种类型的操作,您应该能够使用过滤器查询来获得预期的结果。有关详细信息,请参阅此处的文档:https://www.ibm.com/watson/developercloud/doc/retrieve-rank/plugin_query_syntax.shtml#top