使用 JQ 更新 JSON 数组，其中包含另一个数组的值，即 JOIN

Question

给定两个文件 1.json 和 2.json。它们都是对象数组。需要将字段 ping_latency 从 2.json

更新为 1.json

1.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219"
    },
    {
      "domain": "pl128.nordvpn.com",
      "name": "Poland #128",
      "ip_address": "194.99.105.100"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236"
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 334
    }
]

2.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219",
      "ping_latency": 123
    },
    {
      "domain": "pl27.nordvpn.com",
      "name": "Poland #27",
      "ip_address": "194.99.105.27",
      "ping_latency": "REMOVED"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236",
      "ping_latency": 13
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 67
    }
]

带有“REMOVED”标记的对象不应出现在结果中。因为它不在 1.json.

PS _{我不为 NordVPN 工作 - 这只是一个例子。}

我尝试使用运算符 + 或 * 合并数组。但它总是添加“REMOVED”域。

jq -s 'map(INDEX(.domain)) | add | [.[]]' {1,2}.json

和

jq -s '(.[0]|INDEX(.domain)) as $x | (.[1]|INDEX(.domain)) as $y | $x *$y' {1,2}.json

两者都添加了 2.json 中的“REMOVED”节点。

[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  },
  {
    "domain": "pl27.nordvpn.com",
    "name": "Poland #27",
    "ip_address": "194.99.105.27",
    "ping_latency": "REMOVED"
  }
]

如何管理？

更新。经过一番思想斗争后，我找到了一种方法并设法在 JQ

中做到了

jq 'INDEX(.domain) as $u | 
     reduce ($full[][] | {domain,ip_address,name}) as $i (
     []; . + [ $i | .ping_latency=( $u[$i.domain].ping_latency//98767 )]
    )' --slurpfile full 1.json <2.json

但与运算符 * 相比，我的方法慢了大约 100 倍，在 Intel Core i7-11xxx 上需要 2 秒，数组长度为 5474 个对象

[
  {
    "domain": "ca944.nordvpn.com",
    "ip_address": "172.83.40.219",
    "name": "Canada #944",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "ip_address": "194.99.105.100",
    "name": "Poland #128",
    "ping_latency": 98767
  },
  {
    "domain": "dk151.nordvpn.com",
    "ip_address": "82.102.20.236",
    "name": "Denmark #151",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "ip_address": "82.102.19.137",
    "name": "Belgium #148",
    "ping_latency": 67
  }
]

您是否知道更好的快速方法？

Answer 1

假设 .domain 对 2.json 中的更新对象是唯一的（如果这适用于另一个键则更改；甚至可以使用数组跨越多个键，例如 [.name, .ip_address]), 你可以使用基于唯一性 INDEX 的 JOIN 来匹配相应的对象对。

此外，要合并匹配项，您可以简单地 add 增加配对的成员，因为 2.json 中的任何内容都不能有效地覆盖 1.json 中的任何内容（其他），假设您的样本在这方面具有代表性。如果不是这种情况，请改用 fine-grained 组合方法，例如first + (last | {ping_latency} | select(.[]) // {}) 或类似。

最后，以您的描述为后盾

Object with mark "REMOVED" should not appear in result. Because it is not in 1.json.

进一步假设2.json中一般没有应该新添加到1.json的对象。由于 JOIN 的行为完全如此，因此认为没有必要检查 "REMOVED"。

将第一个文件作为输入，同时使用 --argfile:

将第二个文件读入变量

jq --argfile a 2.json '[JOIN(INDEX($a[]; .domain); .[]; .domain; add)]' 1.json

或者，等效地，使用 --slurp:

将两个文件读入一个数组

jq -s '[JOIN(INDEX(last[]; .domain); first[]; .domain; add)]' 1.json 2.json

[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  }
]

Demo

使用 JQ 更新 JSON 数组，其中包含另一个数组的值，即 JOIN

With JQ update JSON array with values from another one, namely JOIN

json

jq