使用 jq 删除嵌套数组的 matching/non-matching 个元素

Question

我需要将 sonarqube 分析历史的结果拆分成单独的文件。假设下面的起始输入，

    {
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    }
  ]
}

如何使用 jq 清理结果，使其只保留每个元素的历史数组条目？所需的输出是这样的（output-20181118123808.json for analysis done on "2018-11-18T12:37:08+0000"）：

{
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    }
  ]
}

我不知道如何在保持父结构完好无损的情况下仅对子元素进行操作。 JSON 文件的命名将从 jq 实用程序外部处理。提供的示例数据将分为 3 个文件。其他一些输入可以有可变数量的条目，有些可能多达 10000。谢谢。

Answer 1

这是一个使用 awk 写入不同文件的解决方案。该解决方案假定每个度量的日期相同且顺序相同，但对不同日期的数量或不同度量的数量没有限制。

jq -c 'range(0; .measures[0].history|length) as $i
  | (.measures[0].history[$i].date|gsub("[^0-9]";"")),  # basis of filename
    reduce range(0; .measures|length) as $j (.;
      .measures[$j].history |= [.[$i]])' input.json |
awk -F\t 'fn {print >> fn; fn="";next}{fn="output-"  ".json"}'

这种方法的缺点是，如果每个文件要整齐地格式化，每个文件都需要额外的运行漂亮的打印机（例如 jq）。因此，如果要求每个文件中的输出整齐，可以为每个日期进行一次运行ning jq 的情况，从而避免 post 处理的需要（awk) 步骤。

如果措施的日期不一致，那么仍然可以使用与上述相同的方法，但当然收集日期和相应的措施将不得不以不同的方式进行。

输出

上面调用jq产生的前两行如下：

"201811181237080000"
{"paging":{"pageIndex":1,"pageSize":100,"total":3},"measures":[{"metric":"coverage","history":[{"date":"2018-11-18T12:37:08+0000","value":"100.0"}]},{"metric":"bugs","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]},{"metric":"vulnerabilities","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]}]}

Answer 2

在评论中，出现了对原问题的如下补充：

is there a variation wherein the filtering is based on the date value and not the position? It is not guaranteed that the order will be the same or the number of elements in each metric is going to be the same (i.e. some dates may be missing "bugs", some might have additional metric such as "complexity").

以下将生成 JSON 个对象流，每个日期一个。这个流可以根据我之前的回答用日期注释，它显示了如何使用这些注释来创建各种文件。为了便于理解，我们使用两个辅助函数：

def dates:
  INDEX(.measures[].history[].date; .)
  | keys;

def gather($date): map(select(.date==$date));

dates[] as $date
| .measures |= map( .history |= gather($date) )

INDEX/2

如果你的 jq 没有 INDEX/2，现在是升级的好时机，但如果不可行，这里是它的 def:

def INDEX(stream; idx_expr):
  reduce stream as $row ({};
    .[$row|idx_expr|
      if type != "string" then tojson
      else .
      end] |= $row);

使用 jq 删除嵌套数组的 matching/non-matching 个元素

Remove matching/non-matching elements of a nested array using jq

json

data-partitioning

jq

评论

输出

INDEX/2