jq:将 json 个对象值转换为数组

jq: Turn json object values into arrays

我得到了以下对象数组(这只是一个摘录,而且对象更大):

[{
    "DATE": "10.10.2017 01:00",
    "ID": "X",
    "VALUE_ONE": 20,
    "VALUE_TWO": 5
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "X",
    "VALUE_ONE": 30,
    "VALUE_TWO": 7
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "X",
    "VALUE_ONE": 25,
    "VALUE_TWO": 2
  },

  {
    "DATE": "10.10.2017 01:00",
    "ID": "Y",
    "VALUE_ONE": 10,
    "VALUE_TWO": 9
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "Y",
    "VALUE_ONE": 20,
    "VALUE_TWO": 5
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "Y",
    "VALUE_ONE": 50,
    "VALUE_TWO": 5
  },

  {
    "DATE": "10.10.2017 01:00",
    "ID": "Z",
    "VALUE_ONE": 55,
    "VALUE_TWO": 3
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "Z",
    "VALUE_ONE": 60,
    "VALUE_TWO": 7
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "Z",
    "VALUE_ONE": 15,
    "VALUE_TWO": 7
  }
]

为了简化 Web 应用程序的这一过程,同时减小文件大小,我想将每个 "VALUE_ONE""VALUE_TWO""DATE" 值转换为数组29=] 就像这样:

[{
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "X",
    "VALUE_ONE": [20, 30, 25],
    "VALUE_TWO": [5, 7, 2]
  },
  {
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "Y",
    "VALUE_ONE": [10, 20, 50],
    "VALUE_TWO": [9, 5, 5]
  },
  {
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "Z",
    "VALUE_ONE": [55, 60, 15],
    "VALUE_TWO": [3, 7, 7]
  }
]

这里重要的是您需要能够找到链接到特定时间(日期)的值。由于 "DATE" 的输入值是连续的,您很可能不再需要 DATE 值来查找请求的 "VALUE.." 值。您可能只使用数组的索引(index=0 始终是 10.10.2017 01:00index=1 是 ... 02:00 等)。 有可能那样做吗?这将使文件大小更小。 谢谢!

以下解决方案避免了 group_by,原因有两个:

  • 效率
  • group_by 在 jq 1.5 版本中使用的 sort 可能不稳定,这使事情变得复杂。

我们使用 bucketize 定义如下:

def bucketize(f): reduce .[] as $x ({}; .[$x|f] += [$x] );

为简单起见,我们还将定义以下辅助函数:

# compactify an array with a single ID
def compact:
  . as $in
  | reduce (.[0]|keys_unsorted[]) as $key ({};
      . + {($key): $in|map(.[$key])})
    + {"ID": .[0].ID}
    ;

解决方案

[bucketize(.ID)[] | compact]

这将确保一切正常,即使日期集因 ID 而异,即使 JSON 对象最初未按日期分组。

(如果您想在最终结果中完全删除 "DATE",请在上面的行中将对 compact 的调用替换为 compact | del(.DATE)。)

输出

[
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "X",
    "VALUE_ONE": [
      20,
      30,
      25
    ],
    "VALUE_TWO": [
      5,
      7,
      2
    ]
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "Y",
    "VALUE_ONE": [
      10,
      20,
      50
    ],
    "VALUE_TWO": [
      9,
      5,
      5
    ]
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "Z",
    "VALUE_ONE": [
      55,
      60,
      15
    ],
    "VALUE_TWO": [
      3,
      7,
      7
    ]
  }
]

使用2-step reduce(它看起来不漂亮但有效):

jq 'reduce group_by(.ID)[] as $a ([]; . + [ reduce $a[] as $o 
   ({"DATE":[],"VALUE_ONE":[],"VALUE_TWO":[]}; 
    .DATE |= .+ [$o.DATE] | .ID = $o.ID |.VALUE_ONE |= .+ [$o.VALUE_ONE] 
    | .VALUE_TWO |= .+ [$o.VALUE_TWO]) ] )' input.json

输出:

[
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      20,
      30,
      25
    ],
    "VALUE_TWO": [
      5,
      7,
      2
    ],
    "ID": "X"
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      10,
      20,
      50
    ],
    "VALUE_TWO": [
      9,
      5,
      5
    ],
    "ID": "Y"
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      55,
      60,
      15
    ],
    "VALUE_TWO": [
      3,
      7,
      7
    ],
    "ID": "Z"
  }
]

这是一个使用 reduce, setpath, getpath, del and symbolic variable destructuring 的解决方案。它将在并行数组中收集除 IDDATE 之外的键的所有值(消除对 VALUE_ONE 等进行硬编码的需要)。

reduce (.[] | [.ID, .DATE, del(.ID,.DATE)]) as [$id,$date,$v] ({};
    (getpath([$id, "DATE"])|length) as $idx
  | setpath([$id, "ID"]; $id)
  | setpath([$id, "DATE", $idx]; $date)
  | reduce ($v|keys[]) as $k (.; setpath([$id, $k, $idx]; $v[$k]))
)
| map(.)

Try it online!

如果您的数据集足够小,您可以按 id 将它们分组并映射到所需的结果。与流式解决方案相比,它不会非常高效,但使用内置函数实现起来最简单。

group_by(.ID) | map({
    DATE: map(.DATE),
    ID: .[0].ID,
    VALUE_ONE: map(.VALUE_ONE),
    VALUE_TWO: map(.VALUE_TWO)
})