在 Azure DocumentDB 的 select 查询中获取重复记录

Getting duplicate records in select query for the Azure DocumentDB

我需要为 Azure DatabaseDB 中的以下 JSON 数据编写一个 Select 查询。

{
  "Result": [
    {
      "media": [
        {
          "url": "https://someurl.com",
          "thumb_url": "https://someurl.com",
          "id": "f545f874-a9b4-4573-a0b0-b2d50a7994e0",
          "removed": false,
          "size": 133454,
          "length": 0,
          "type": "IMG",
          "avail": true,
          "has_thumb": true,
          "tagged_chi": [
            {
              "chi_id": "1069b9ef-1028-45f4-b9a1-a40e0d438f4e",
              "tag_x": 262.048,
              "tag_y": 157.472,
              "tag_by": "d481a522-6e2f-4dc6-8aeb-bc87cf27287d",
              "created": 1486723018,
              "last_updated": 1486723018
            },
            {
              "chi_id": "7102fc10-62e8-4d0a-9fcf-35645253fcef",
              "tag_x": 231.648,
              "tag_y": 146.528,
              "tag_by": "d481a522-6e2f-4dc6-8aeb-bc87cf27287d",
              "created": 1486723018,
              "last_updated": 1486723018
            }
          ],
          "created": 1486723012,
          "last_updated": 1486723017
        }
      ],
      "id": "23bcd070-0f64-4914-8bc1-d5e936552295",
      "acc_id": "d481a522-6e2f-4dc6-8aeb-bc87cf27287d",
      "chi_id": "7102fc10-62e8-4d0a-9fcf-35645253fcef",
      "is_note": false,
      "title": "",
      "when": -2147483648,
      "loc_id": null,
      "col_id": null,
      "comment": null,
      "removed": false,
      "created": -2147483648,
      "last_updated": -2147483648,
      "note_type": null,
      "note_value": null
    },
    {
      "media": [
        {
          "url": "https://someurl.com",
          "thumb_url": "https://someurl.com",
          "id": "7665b921-2790-496b-a70f-30afae43d8c6",
          "removed": false,
          "size": 6872977,
          "length": 0,
          "type": "IMG",
          "avail": true,
          "has_thumb": true,
          "tagged_chi": [
            {
              "chi_id": "1069b9ef-1028-45f4-b9a1-a40e0d438f4e",
              "tag_x": 2305.152,
              "tag_y": 686.5653,
              "tag_by": "d481a522-6e2f-4dc6-8aeb-bc87cf27287d",
              "created": 1486976119,
              "last_updated": 1486976119
            },
            {
              "chi_id": "7102fc10-62e8-4d0a-9fcf-35645253fcef",
              "tag_x": 1070.757,
              "tag_y": 1038.741,
              "tag_by": "d481a522-6e2f-4dc6-8aeb-bc87cf27287d",
              "created": 1486976119,
              "last_updated": 1486976119
            }
          ],
          "created": 1486976100,
          "last_updated": 1486976118
        }
      ],
      "id": "58fa3c58-5508-4371-83f4-405332c636e1",
      "acc_id": "d481a522-6e2f-4dc6-8aeb-bc87cf27287d",
      "chi_id": "7102fc10-62e8-4d0a-9fcf-35645253fcef",
      "is_note": false,
      "title": "",
      "when": -2147483648,
      "loc_id": null,
      "col_id": null,
      "comment": null,
      "removed": false,
      "created": -2147483648,
      "last_updated": -2147483648,
      "note_type": null,
      "note_value": null
    }
  ],
  "Continuation": null
}

我正在尝试类似下面的方法,但它对我不起作用。我希望数据匹配 Media => tagged_chil => id

@peter-tirrell 建议的查询:

string.Format("select c.id, c.acc_id, c.chi_id, c.is_note, c.title, c.loc_id, c.media, t from c JOIN m IN c.media JOIN t IN m.tagged_chi where c.chi_id = '{0}' OR t.chi_id = '{0}'", childId)

@peter-tirrell 查询的细微变化:

string.Format("select c.id, c.acc_id, c.chi_id, c.is_note, c.title, c.loc_id, c.media, t from c JOIN m IN c.media JOIN t IN m.tagged_chi where c.chi_id = '{0}' OR ( t.chi_id != c.chi_id AND t.chi_id = '{0}')", childId)

I am getting duplicate records if the c.child and t.child both are having same values.

根据我的经验,您的查询代码将 return 为空。因为 ARRAY_CONTAINS 它将 return 一个布尔值,指示数组是否包含指定值。这意味着您的查询代码可以短至 SELECT * FROM TimelineEvent t WHERE OR ARRAY_CONTAINS ( t.media, true),在您的情况下 return 为 null。

请尝试使用以下代码:

SELECT * FROM TimelineEvent t WHERE  ARRAY_CONTAINS ( t.media[0].tagged_chi, {  
               "id":"0af23202-07f9-40a0-90ba-d2e2f6679331"
             }) 

我们也可以使用UDF来自定义代码来实现它,关于UDF的更多细节,请参考document

您可能会使用 JOINs 来展平结构,这也可能有助于查询。类似于:

select 
c.id,
c.acc_id,
c.chi_id,
c.is_note,
c.title,
c.loc_id,
m,
t
from c JOIN m IN c.media
JOIN t IN m.tagged_chi
where c.chi_id = {0} OR t.id = {0}

然后您可以select出您需要的任何特定数据字段。