提取深度嵌入的子集 json 并仅打印键值对 我对子集 json 感兴趣

extract a subset of deep embed json and print only key,value pair I am interested in the subset json

我有一个深度嵌入的 json 文件: 我只想提取和解析我感兴趣的子集,在我的例子中是 'node' 键中的所有内容。 我怎样才能:

  1. 提取此 json 文件的子集,其中包含 "edges[].node"(edges 是节点的 'parent' 键)

  2. 'node'会话中,我对key:value一对

    感兴趣
    .url,
    .headline.default, (*this one is 'grandchild' of key 'node'*)
    .firstPublished
    

    我想在'node'键中只保留以上3项 我怎样才能打印出我需要的 json 文件的超薄版本?

  3. 一个更好的选择是:我仍然可以保留 structure/full 路径 导致 json 根密钥嵌入 'node' json 我感兴趣的子集 ?

Here is the jqplay-myjson (full content of my json file)

尝试在此处附上我的全部内容:

{
  "data": {
    "legacyCollection": {
      "longDescription": "The latest news, analysis and investigations from Europe.",
      "section": {
        "name": "world",
        "url": "/section/world"
      },
      "collectionsPage": {
        "stream": {
          "pageInfo": {
            "hasNextPage": true,
            "__typename": "PageInfo"
          },
          "__typename": "AssetsConnection",
          "edges": [
            {
              "node": {
                "url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
                "firstPublished": "2022-04-27T23:28:33.241Z",
                "headline": {
                  "default": "I.C.C. Joins Investigation of War Crimes in Ukraine",
                  "__typename": "CreativeWorkHeadline"
                },
                "summary": "Karim Khan, the chief prosecutor of the International Criminal Court, said that his organization would participate in a joint effort — with Ukraine, Poland and Lithuania — to investigate war crimes committed since Russia’s invasion.",
                "promotionalMedia": {
                  "__typename": "Image",
                  "id": "SW1hZ2U6bnl0Oi8vaW1hZ2UvYTY3MTVhNDUtZDE0NS01OWZjLThkZWItNzYxMWViN2UyODhk"
                },
                "embedded": false
              },
              "__typename": "AssetsEdge"
            },
            {
              "node": {
                "__typename": "Article",
                "url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
                "firstPublished": "2022-04-27T19:42:17.000Z",
                "typeOfMaterials": [
                  "News"
                ],
                "archiveProperties": {
                  "lede": "",
                  "__typename": "ArticleArchiveProperties"
                },
                "headline": {
                  "default": "Endgame Nears in Bidding for Chelsea F.C.",
                  "__typename": "CreativeWorkHeadline"
                },
                "summary": "The American bank selling the English soccer team on behalf of its Russian owner could name its preferred suitor by the end of the week. But the drama isn’t over.",
                "translations": []
              },
              "__typename": "AssetsEdge"
            }
          ],
          "totalCount": 52559
        }
      },
      "sourceId": "100000004047788",
      "tagline": "",
      "__typename": "LegacyCollection"
    }
  }
}

这是我的命令 jqplay Demo:

.data.legacyCollection.collectionsPage.stream.edges[].node|= with_entries(select([.key]|inside(["default","url","firstPublished"]))

这是我得到的输出

{
  "data": {
    "legacyCollection": {
      "longDescription": "The latest news, analysis and investigations from Europe.",
      "section": {
        "name": "world",
        "url": "/section/world"
      },
      "collectionsPage": {
        "stream": {
          "pageInfo": {
            "hasNextPage": true,
            "__typename": "PageInfo"
          },
          "__typename": "AssetsConnection",
          "edges": [
            {
              "node": {
                "url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
                "firstPublished": "2022-04-27T23:28:33.241Z"
              },
              "__typename": "AssetsEdge"
            },
            {
              "node": {
                "url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
                "firstPublished": "2022-04-27T19:42:17.000Z"
              },
              "__typename": "AssetsEdge"
            }
          ],
          "totalCount": 52559
        }
      },
      "sourceId": "100000004047788",
      "tagline": "",
      "__typename": "LegacyCollection"
    }
  }
}

这是我期望

的输出
{
  "data": {
    "legacyCollection": {
      "collectionsPage": {
        "stream": {
          "edges": [
            {
              "node": {
                "url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
                "firstPublished": "2022-04-27T23:28:33.241Z"
              }
            },
            {
              "node": {
                "url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
                "firstPublished": "2022-04-27T19:42:17.000Z"
              }
            }
          ]
        }
      }
    }
  }
}

这是一种在确保结构得以保留的情况下进行选择的方法。这个解决方案可能很有趣,因为 它可以很容易地适应 jq 的“--stream”选项。

def array_startswith($head): .[: $head|length] == $head;

. as $in
| ["data", "legacyCollection", "collectionsPage", "stream", "edges"] as $head
| ($head|length) as $len
| reduce (paths
          | select( array_startswith($head) and .[1+$len] == "node" )) as $p
    (null;
     if ((($p|length) == $len + 3) and ($p[-1] | IN("url", "firstPublished")))
        or ((($p|length) == $len + 4) and $p[-2:] == ["headline", "default"])
     then setpath($p; $in | getpath($p))
     else .
     end)

这是一个(有点)声明性的解决方案:

(.data.legacyCollection.collectionsPage.stream.edges
 | map( {node: (.node
                | {url,
                   firstPublished,
                   headline: {default: .headline.default} })})) as $edges
| {data: {
     legacyCollection: {
       collectionsPage: {
         stream: {
           $edges
         }
       }
     }
   }
  }