使用 jq 将来自 Json 文件的 table 形式的元素相关联

Relate elements in table form from Json file with jq

我是 jq 的新手,我有以下代码来获取每个名为 Abc 的元素的值列表:

["Abc"], ( .. | objects | select(has("Abc")) | [.["Abc"]] ) | @tsv

这是我得到的当前输出:

"Abc"
"4"
"2"
"1"
"9"
"3"
"2"
"4"
"9"

我想在左侧添加 4 列以显示每个 Abc 值对应的页面、行和列。此外,如果可能的话,在第一列添加一个从 1 到 "Abc" 元素数的计数器。

下面我展示了当前的输出,并与期望的输出和 Json 文件的结构进行了比较,以阐明:

输入的Json文件如下:

{
  "document": {
    "page": [
      {
        "@index": "0",
        "image": {
          "Abc": "4"
        }
      },
      {
        "@index": "1",
        "row": [
          {
            "column": [
              {
                "text": {
                  "Abc": "2"
                }
              }
            ]
          },
          {
            "column": [
              {
                "text": {
                  "Abc": "1"
                }
              },
              {
                "text": {
                  "Abc": "9"
                }
              }
            ]
          },
          {
            "column": [
              {
                "text": {
                  "Abc": "3"
                }
              }
            ]
          }
        ]
      },
      {
        "@index": "2",
        "row": [
          {
            "column": [
              {
                "text": {
                  "Abc": "2"
                }
              }
            ]
          },
          {
            "column": [
              {
                "text": {
                  "Abc": "4"
                }
              },
              {
                "text": {
                  "Abc": "9"
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

我希望有人能帮助我。提前致谢。

输入数据的不规则性使得需求有点不透明,但是下面产生了想要的输出。

["counter", "page", "row", "column", "Abc"],
(foreach (.document.page[] | objects) as $page ({page: -1, counter: 0};
  .page += 1
  | if ($page | (has("image") and (.image|has("Abc"))))
    then
      .counter +=1
      | .out = [.counter, .page, null, null, ($page|.image.Abc)]
    else foreach ($page | .row[]?) as $row (.row=-1;
      .row += 1
      | foreach ($row | .column[]) as $column (.column=-1;
          .column +=1
          | foreach ($column | .text | objects) as $x (.;
              .counter += 1
              | .out = [.counter, .page, .row, .column, $x["Abc"]]
              ; . )
           ; . )
      ; . )
    end
    ; .out )
)
| @tsv

输出

具体来说,使用 -r command-line 选项,给定输入产生的输出如下(包括制表符):

counter page    row column  Abc
1   0           4
2   1   0   0   2
3   1   1   0   1
4   1   1   1   9
5   1   2   0   3
6   2   0   0   2
7   2   1   0   4
8   2   1   1   9

以下解决方案使用 paths 并具有几个优点,包括简洁、简单,并且它可以很容易地适应不同格式的句柄数据。

为清楚起见,我们首先定义一个添加行号的函数:

# add a sequential id, starting at 1
def tsvRows(s):
  foreach s as $s (0; .+1; [.] + $s)
  | @tsv;

(["counter", "page", "row", "column", "Abc"] | @tsv),
tsvRows(paths as $p
  | select($p[-1] == "Abc")
  | getpath($p) as $v
  | $p
  | .[2] as $page
  | (if .[3] == "row" then .[4] else null end) as $row
  | (if .[5] == "column" then .[6] else null end) as $column
  | [$page, $row, $column, $v] )