jq - 按字段值分组 json 个对象并在一行中输出分组值

jq - group json objects by field value and output grouped values in one line

我有一个 json 格式,其中包含来自 AWS Cloudwatch 的指标、时间戳和值。

{
    "Messages": [],
    "MetricDataResults": [
        {
            "Timestamps": [
                "2021-07-07T13:26:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.0
            ],
            "Id": "m19",
            "Label": "CPUSurplusCreditsCharged"
        },
        {
            "Timestamps": [
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                12.750425014167137,
                13.033116114731422,
                12.70812153130781,
                12.975,
                15.441924032067199,
                12.916451392476791
            ],
            "Id": "m20",
            "Label": "CPUUtilization"
        },
        {
            "Timestamps": [
                "2021-07-07T13:29:00Z",
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.7,
                0.6999533364442371,
                0.6998833527745376,
                0.6999416715273727,
                0.7,
                0.7001166861143524,
                0.6998950157476379
            ],
            "Id": "m21",
            "Label": "NetworkReceiveThroughput"
        }
    ]
}

我使用jq命令把这些值放在一个数组变量中
并将结果输出到数组变量如下。

jq -r '.MetricDataResults[] | "\(.Label) \(.Timestamps) \(.Values)"' test.json | while read Label timestamp value
do

  Label=`echo $Label | sed 's/\"//g; s/\[//g; s/\]//g; s/,/ /g'`
  timestamp=`echo $timestamp | sed 's/\"//g; s/\[//g; s/\]//g; s/,/ /g'`
  value=`echo $value | sed 's/\"//g; s/\[//g; s/\]//g; s/,/ /g'`

  arr_timestamp=($timestamp)
  arr_value=($value)

  echo $Label
  echo ${arr_timestamp[@]}
  echo ${arr_value[@]}
done



Evictions
2021-07-07T10:51:00Z 2021-07-07T10:50:00Z 2021-07-07T10:49:00Z 2021-07-07T10:48:00Z 2021-07-07T10:47:00Z 2021-07-07T10:46:00Z 2021-07-07T10:45:00Z
0 0 0 0 0 0 0

CPUUtilization
2021-07-07T10:50:00Z 2021-07-07T10:49:00Z 2021-07-07T10:48:00Z 2021-07-07T10:47:00Z 2021-07-07T10:46:00Z 2021-07-07T10:45:00Z
1.5333333333333332 1.4666666666666666 1.5833333333333333 1.5333333333333332 1.4916666666666665 1.4916666666666665

IsMaster
2021-07-07T10:51:00Z 2021-07-07T10:50:00Z 2021-07-07T10:49:00Z 2021-07-07T10:48:00Z 2021-07-07T10:47:00Z 2021-07-07T10:46:00Z 2021-07-07T10:45:00Z
1 1 1 1 1 1 1

当每个数组变量的时间戳长度不同时,
我想只显示与单个字符串相同时间戳中的值。

例如

"2021-07-07T10:51:00Z Evictions = 0\nIsMaster = 1"
"2021-07-07T10:50:00Z Evictions = 0\nCPUUtilization = 1.5333333333333332\n IsMaster = 1"
...

脑袋坏了,想不出好办法。
有什么好的方法请告诉我
我没有太多时间所以请帮助计算器。

{
    "MetricDataResults": [
        {
            "Timestamps": "2021-07-07T13:28:00Z",
            "Label" : [
               "CPUUtilization",
               "NetworkReceiveThroughput"
            ],
            "Values" : [
               12.750425014167137,
               0.7
            ]
         },
         {
            "Timestamps": "2021-07-07T13:27:00Z",
            "Label" : [
               "CPUUtilization",
               "NetworkReceiveThroughput"
            ],
            "Values" : [
               13.033116114731422,
               0.6999533364442371
            ]
         },
         {
            "Timestamps": "2021-07-07T13:26:00Z",
            "Label" : [
               "CPUUtilization",
               "NetworkReceiveThroughput",
               "CPUSurplusCreditsCharged"
            ],
            "Values" : [
               12.70812153130781,
               0.6998833527745376,
               0.0
            ]
        }
    ]
}

您只需使用 jq 即可实现您的目标。 shell 脚本的进一步处理是不必要的。 以下 shell 脚本为您提供了两种选择:

  • 输出为文本
  • 输出为json
#!/bin/bash

INPUT='
{
    "Messages": [],
    "MetricDataResults": [
        {
            "Timestamps": [
                "2021-07-07T13:26:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.0
            ],
            "Id": "m19",
            "Label": "CPUSurplusCreditsCharged"
        },
        {
            "Timestamps": [
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                12.750425014167137,
                13.033116114731422,
                12.70812153130781,
                12.975,
                15.441924032067199,
                12.916451392476791
            ],
            "Id": "m20",
            "Label": "CPUUtilization"
        },
        {
            "Timestamps": [
                "2021-07-07T13:29:00Z",
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.7,
                0.6999533364442371,
                0.6998833527745376,
                0.6999416715273727,
                0.7,
                0.7001166861143524,
                0.6998950157476379
            ],
            "Id": "m21",
            "Label": "NetworkReceiveThroughput"
        }
    ]
}
'

# output as plain text
jq -r '
  .MetricDataResults
  | map(.Values as $values | .Timestamps as $timestamps
        | {Label} +
          foreach range(.Timestamps | length) as $idx
                  (null; {"Timestamp": $timestamps[$idx], "Value": $values[$idx]}; .))
  | group_by(.Timestamp)[]
  | [.[0].Timestamp]
    + map("\(.Label)=\(.Value)")
    | join("\n") + "\n"
' <<< "$INPUT"

# output as json
jq -r '
  .MetricDataResults
  |= (map(.Values as $values | .Timestamps as $timestamps
          | {Id, Label, StatusCode} +
            foreach range(.Timestamps | length) as $idx
                    (null; {"Timestamp": $timestamps[$idx], "Value": $values[$idx]}; .))
     | group_by(.Timestamp)
     | map({Timestamp: .[0].Timestamp,
            Events: del(.[].Timestamp)}))
' <<< "$INPUT"

shell 脚本的第一个 jq 命令产生:

2021-07-07T13:23:00Z
CPUUtilization=12.916451392476791
NetworkReceiveThroughput=0.6998950157476379

2021-07-07T13:24:00Z
CPUUtilization=15.441924032067199
NetworkReceiveThroughput=0.7001166861143524

2021-07-07T13:25:00Z
CPUUtilization=12.975
NetworkReceiveThroughput=0.7

2021-07-07T13:26:00Z
CPUSurplusCreditsCharged=0
CPUUtilization=12.70812153130781
NetworkReceiveThroughput=0.6999416715273727

2021-07-07T13:27:00Z
CPUUtilization=13.033116114731422
NetworkReceiveThroughput=0.6998833527745376

2021-07-07T13:28:00Z
CPUUtilization=12.750425014167137
NetworkReceiveThroughput=0.6999533364442371

2021-07-07T13:29:00Z
NetworkReceiveThroughput=0.7

shell 脚本的第二个 jq 命令产生:

{
  "Messages": [],
  "MetricDataResults": [
    {
      "Timestamp": "2021-07-07T13:23:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.916451392476791
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6998950157476379
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:24:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 15.441924032067199
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.7001166861143524
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:25:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.975
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.7
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:26:00Z",
      "Events": [
        {
          "Id": "m19",
          "Label": "CPUSurplusCreditsCharged",
          "StatusCode": "Complete",
          "Value": 0
        },
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.70812153130781
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6999416715273727
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:27:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 13.033116114731422
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6998833527745376
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:28:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.750425014167137
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6999533364442371
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:29:00Z",
      "Events": [
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.7
        }
      ]
    }
  ]
}

这是文本输出情况下的一个简单、惯用的解决方案;从 1.3 开始,它可以与任何版本的 jq 一起使用。特别注意它不依赖foreach,这里的使用过于复杂:

< input.json jq -r '
  .MetricDataResults
  | map(.Values as $values
        | .Timestamps as $timestamps
        | {Label} +
           (range(0; .Timestamps|length) as $idx
            | {Timestamp: $timestamps[$idx], 
               Value:     $values[$idx]} ))
  | group_by(.Timestamp)[]
  | .[0].Timestamp, (.[]|"\(.Label)=\(.Value)"), ""
'