GNU Parallel 使用管道并行执行命令?

Parallel executing of commands with pipe by GNU Parallel?

给定一个包含多个命令的任务,这些命令由管道组合而成:

cat input/file1.json | jq '.responses[0] | {labelAnnotations: .labelAnnotations}' > output/file1.json

现在,有数千个输入 JSON 文件,我喜欢利用 GNU Parallel 来并行化所有进程。我怎么能那样做?是这样的吗?

parallel cat {} | jq '...' > output/{./} ::: input/*.json

注意:如果 jq 的过滤器内有管道,情况会变得更加复杂...

https://www.gnu.org/software/parallel/man.html#QUOTING 说:

Conclusion: To avoid dealing with the quoting problems it may be easier just to write a small script or a function (remember to export -f the function) and have GNU parallel call that.

在您的情况下,它将如下所示:

doit() {
  cat "" |
    jq '.responses[0] | {labelAnnotations: .labelAnnotations}' > "" 
}
export -f doit

parallel doit {} output/{/} ::: input/*.json

一个好处是你可以测试它:

doit input/foo1.json output/foo1.json

当它起作用时,并行化它是微不足道的。

如果您有较新版本的 GNU Parallel,这也应该有效:

parallel --results output/{/} -q jq '.responses[0] | {labelAnnotations: .labelAnnotations}' ::: input/*.json