有没有办法使用 jq 通过其公共键拆分 JSON 文件?

Is there a way to use jq to split a JSON file by its common keys?

我有一组很多股票的定价数据(大约 110 万行)。

我在解析内存中的所有这些数据时遇到问题,因此我想按股票代码将其拆分为单独的文件,并仅在需要时导入数据。

发件人:

stockprices.json

收件人:

AAPL.json
ACN.json
...

等等

stockprices.json 目前有这个结构:

[{
    "date": "2016-03-22 00:00:00",
    "symbol": "ACN",
    "open": "121.029999",
    "close": "121.470001",
    "low": "120.720001",
    "high": "122.910004",
    "volume": "711400.0"
},
{
    "date": "2016-03-23 00:00:00",
    "symbol": "AAPL",
    "open": "121.470001",
    "close": "119.379997",
    "low": "119.099998",
    "high": "121.470001",
    "volume": "444200.0"
},
{
    "date": "2016-03-24 00:00:00",
    "symbol": "AAPL",
    "open": "118.889999",
    "close": "119.410004",
    "low": "117.639999",
    "high": "119.440002",
    "volume": "534100.0"
},
...{}....]

我相信 jq 是完成这项工作的正确工具,但我无法理解它。

如何获取上面的数据并使用 jq 按符号字段拆分它?

例如,我想结束:

AAPL.json:

[{
    "date": "2016-03-23 00:00:00",
    "symbol": "AAPL",
    "open": "121.470001",
    "close": "119.379997",
    "low": "119.099998",
    "high": "121.470001",
    "volume": "444200.0"
},
{
    "date": "2016-03-24 00:00:00",
    "symbol": "AAPL",
    "open": "118.889999",
    "close": "119.410004",
    "low": "117.639999",
    "high": "119.440002",
    "volume": "534100.0"
}]

和ACN.json:

[{
    "date": "2016-03-22 00:00:00",
    "symbol": "ACN",
    "open": "121.029999",
    "close": "121.470001",
    "low": "120.720001",
    "high": "122.910004",
    "volume": "711400.0"
},
    {
    "date": "2016-03-22 00:00:00",
    "symbol": "ACN",
    "open": "121.029999",
    "close": "121.470001",
    "low": "120.720001",
    "high": "122.910004",
    "volume": "711400.0"
}
]

您可以使用一点 shell 循环:

#!/bin/bash
jq -r '.[].symbol' stockprices.json | while read -r symbol ; do
    jq --arg s "${symbol}" \
        'map(if .symbol == $s then . else empty end)' \
    stockprices.json > "${symbol}".json
done 

这是假设您的 RAM 足够大的一次性解决方案。该解决方案避免使用 group_by,因为这需要进行排序操作,这是不必要的,并且在时间和内存方面可能成本很高。

为了创建输出文件,此处使用 awk 以提高效率,但对方法而言并不重要。

split.jq

def aggregate_by(s; f; g):
  reduce s as $x  (null; .[$x|f] += [$x|g]);

aggregate_by(.[]; .symbol; .)
| keys_unsorted[] as $k
| $k, .[$k]

使用 awk 调用

jq -f split.jq stockprices.json | awk '
  substr([=11=],1,1) == "\"" {
    if (fn) {close(fn)};
    gsub(/^"|"$/,"",[=11=]); fn=[=11=] ".json"; next;
  }
  {print >> fn}'

您需要一个循环,但它可以在一次调用中完成:

jq -rc 'group_by(.symbol)[] | "\(.[0].symbol)\t\(.)"' stockprices.json |
while IFS=$'\t' read -r symbol content; do
    echo "${content}" > "${symbol}.json"
done