使用jq合并json个文件，获取文件列表长度太长错误

Question

使用 jq 连接目录中的 json 个文件。

该目录包含几十万个文件。

jq -s '.' *.json > output.json

returns一个文件列表太长的错误。有没有一种方法可以使用可以接收更多文件的方法来编写它？

Answer 1

[已编辑以使用查找]

一件明显需要考虑的事情是一次处理一个文件，然后“吞噬”它们：

$ while IFS= read -r f ; cat "$f" ; done <(find . -maxdepth 1 -name "*.json") | jq -s .

然而，这可能需要大量内存。因此，以下内容可能更接近您的需要：

#!/bin/bash
# "slurp" a bunch of files
# Requires a version of jq with 'inputs'.
echo "["
while read f
do
  jq -nr 'inputs | (., ",")' $f
done < <(find . -maxdepth 1 -name "*.json") | sed '$d'
echo "]"

Answer 2

问题是命令行的长度是有限的，*.json为一个命令行产生了太多的参数。一种解决方法是在 for 循环中扩展模式，它与命令行没有相同的限制，因为 bash 可以在内部迭代结果，而不必为外部命令：

for f in *.json; do
    cat "$f"
done | jq -s '.' > output.json

虽然这样效率很低，因为每个文件需要运行 cat 一次。一个更有效的解决方案是使用 find 调用 cat 每次尽可能多的文件。

find . -name '*.json' -exec cat '{}' + | jq -s '.' > output.json

（您可以简单地使用

find . -name '*.json' -exec jq -s '{}' + > output.json

还有；它可能取决于文件中的内容以及使用 -s 选项对 jq 的多次调用与单次调用的比较。）

Answer 3

如果jq -s . *.json > output.json产生"argument list too long"；你可以 fix it using zargs in zsh:

$ zargs *.json -- cat | jq -s . > output.json

您可以使用 find 进行模拟，如 :

所示

$ find -maxdepth 1 -name \*.json -exec cat {} + | jq -s . > output.json

"Data in jq is represented as streams of JSON values ... This is a cat-friendly format - you can just join two JSON streams together and get a valid JSON stream.":

$ echo '{"a":1}{"b":2}' | jq -s .
[
  {
    "a": 1
  },
  {
    "b": 2
  }
]

使用jq合并json个文件，获取文件列表长度太长错误

Using jq to combine json files, getting file list length too long error

linux

json

jq