Solr查询完全匹配和部分匹配搜索查询

Solr Query Exact Match and partial match search Querying

我必须使用完全匹配和部分匹配来搜索一些文档。 例如:我有标题为 "ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE" 的文档。我想以高分搜索 ABC-01(与标题中的搜索词完全匹配)我还想搜索包含 ABC-01 的文档。此外,它应该根据分数和日期按 desc 顺序排序。 还有一个名为 driver 的字段。搜索还应搜索分数低于标题完全匹配或部分匹配的 driver 字段。

(请注意仅限完全匹配搜索 "ABC-01" 而非 "ABC-010") 有什么线索吗?



对于这个例子 如果我搜索 ABC-01

- 我想要以下结果



如果搜索词与标题完全匹配,请给予高分。 或者如果不是,它应该搜索标题字段包含 ABC-01 或 abc-01-xe 或任何包含 abc-01 的内容。 它还应该搜索 driver 字段以查找给定术语的任何相关 driver。

结果应根据分数和日期排序。 此外,精确匹配的最近日期应首先显示在订单中。

编辑后的回复: 正如 Alexandre 指出的那样,您可以使用 edismax 分配权重。为了好玩,如果您将底部的示例数据添加到测试核心,然后 运行 下面的搜索将为您提供正确的出租车顺序。

http://.us-west-2.compute.amazonaws.com:8983/solr/abc123/select?defType=edismax&indent=on&q=id:ABC-01*%20ORTitle:ABC- 01&qf=id^1.5%20Title^0.7&wt=json

在常规查询中,您有一个带有 OR 的普通普通通配符搜索:

id:ABC-01* 
OR
Title:*ABC-01*

然后启用 edismax 并分配权重,我将 id 提高 1.5 并将 Title 降低到 0.7,如:

id^1.5 Title^0.7

回复如下:

{
  "responseHeader":{
    "status":0,
    "QTime":23,
    "params":{
      "q":"id:ABC-01* \nOR\nTitle:*ABC-01*",
      "defType":"edismax",
      "indent":"on",
      "qf":"id^1.5 Title^0.7",
      "wt":"json",
      "_":"1477029831405"}},
  "response":{"numFound":13,"start":0,"docs":[
      {
        "id":"ABC-01",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-10T00:00:00Z"],
        "_version_":1548778151323107328},
      {
        "id":"ABC-010",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-14T00:00:00Z"],
        "_version_":1548778151552745472},
      {
        "id":"ABC-01234",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-14T00:00:00Z"],
        "_version_":1548778803999801344},
      {
        "id":"ABC-02",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-11T00:00:00Z"],
        "_version_":1548778151538065408},
      {
        "id":"ABC-03",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-12T00:00:00Z"],
        "_version_":1548778151548551168},
      {
        "id":"ABC-04",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-13T00:00:00Z"],
        "_version_":1548778151549599744},
      {
        "id":"XYZ-04",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-13T00:00:00Z"],
        "_version_":1548778151556939776},
      {
        "id":"ABC-07",
        "Title":["ABC-07 IS AVAILABLE ABC-01-XE"],
        "joinedDate":["2015-01-12T00:00:00Z"],
        "_version_":1548778495705874432},
      {
        "id":"BBC-02",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. "],
        "joinedDate":["2016-01-11T00:00:00Z"],
        "_version_":1548778803994558464},
      {
        "id":"ABC-010101",
        "Title":["ABC-02 CAB IS BUSY RIGHT NOW. ABC01 CAB IS AVAILABLE"],
        "joinedDate":["2016-01-12T00:00:00Z"],
        "_version_":1548778803995607040}]
  }}

要添加的示例数据:

 <add><doc>
<field name="id">ABC-01</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-10</field>
</doc>
<doc>
<field name="id">ABC-02</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-11</field>
</doc>
<doc>
<field name="id">ABC-03</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-12</field>
</doc>
<doc>
<field name="id">ABC-04</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-13</field>
</doc>
<doc>
<field name="id">ABC-010</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-14</field>
</doc>
<doc>
<field name="id">ABC-07</field>
<field name="Title">ABC-07 IS AVAILABLE ABC-01-XE</field>
<field name="joinedDate">2015-01-12</field>
</doc>

<doc>
<field name="id">XYZ-04</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-13</field>
</doc>
<doc>
<field name="id">DBC-01</field>
<field name="Title">DBC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-10</field>
</doc>
<doc>
<field name="id">BBC-02</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. </field>
<field name="joinedDate">2016-01-11</field>
</doc>
<doc>
<field name="id">ABC-010101</field>
<field name="Title">ABC-02 CAB IS BUSY RIGHT NOW. ABC01 CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-12</field>
</doc>
<doc>
<field name="id">ABC-01QWERTY</field>
<field name="Title">CAB IS BUSY RIGHT NOW. </field>
<field name="joinedDate">2016-01-13</field>
</doc>
<doc>
<field name="id">ABC-01234</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-14</field>
</doc>
<doc>
<field name="id">ABC-007</field>
<field name="Title">ABC-007 IS AVAILABLE ABC-01-XE</field>
<field name="joinedDate">2015-01-12</field>
</doc>
<doc>
<field name="id">XYZ-014</field>
<field name="Title"> ABCDE CAB IS AVAILABLE. ABC-01 CAB IS BUSY RIGHT NOW.</field>
<field name="joinedDate">2016-01-13</field>
</doc></add>

原始回复: 您可能正在寻找类似以下内容的内容:

id:ABC-01* OR id:*ABC

URL 中的查询如下所示:

http:<server>:8983/solr/<core>/select?indent=on&q=id:ABC-01*%20OR%20id:*ABC&wt=json 

您在这里有几个问题。

您可以使用eDisMax搜索多个字段,并对不同的字段赋予不同的权重进行排序。

您可以按混合分数和日期的函数查询排序并进行实验,直到获得正确的组合。

将 ABC-01-xe 与 ABC-01 进行匹配有点困难,因为不清楚您的意思。这将是某种索引时间分析器链元素,但具体取决于您的映射。 ABC-01-ANYTHING 是否映射到 ABC-01,或者必须是 ABC-01-xe。 ABC-01234 呢?您需要首先获取此映射的业务规则,然后努力确保——在索引时间分析器链的末尾——您得到了您想要的。您可能还希望两个具有相同信息的字段以不同方式处理,一个处理较少(例如 ABC-01 精确)具有较高权重。