Bash 中的 Xargs 并行度

Question

所以我在 BASH 中有这个函数，我试图理解它 - 它使用并行性：

function get_cache_files() {
    ## The maximum number of parallel processes. 16 since the cache
    ## naming scheme is hex based.
    local max_parallel=${3-16}
    ## Get the cache files running grep in parallel for each top level
    ## cache dir.
    find  -maxdepth 1 -type d | xargs -P $max_parallel -n 1 grep -Rl "KEY:.*" | sort -u
} # get_cache_files

所以我的问题：

评论：“16 因为缓存命名方案是基于十六进制的”- 命名示例是这样的： php2-mindaugasb.c9.io/5c/c6/348e9a5b0e11fb6cd5948155c02cc65c - 当命名方案基于 HEX（十六进制系统）时，为什么使用 16 个进程很重要？
XARGS 的 -P 选项用于 max-procs:

Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option with -P; otherwise chances are that only one exec will be done.

好的，所以： "xargs -P $max_parallel -n 1" 是正确的，将启动 16 个进程？或者 n 也应该等于 $max_parallel 吗？

据我了解并行化的条件是：
1. 将对其执行操作的资源的独立性（如将对其执行操作的类似文件）；
2. 在独立的电脑上进行操作；
什么是其他条件，什么情况下可以并行化？

Answer 1

Ok, so: "xargs -P $max_parallel -n 1" is correct and 16 processes will be initiated? Or should n be equal to $max_parallel also?

想想商店里的几个结账柜台和大量等待结账的顾客。 -P 以此类推就是点钞机的数量（这里是 16）。 -n 是一个柜台一次能够处理的客户数量（此处为 1）。在这种情况下，很容易想象每个柜台上的队列大小大致相等，对吧？

从问题的角度来看，max_parallel=${3-16}表示如果$3参数没有传给函数，变量就设置为16。 xargs 启动 grep 的 16 个（-P 参数）并行进程。每个进程从 xargs 的 stdin 中获取 exactly 一行（-n 参数）作为最后一个命令行参数。在这种情况下，xargs 的标准输入是 find 命令的输出。总的来说，find 命令将列出所有目录，它的输出将被 16 个 grep 进程逐行使用。每个 grep 进程将被调用为：

grep -R1 "KEY:.*" <one line from find-output/xargs-input>

The comment: "16 since the cache naming scheme is hex based" - naming example is this: php2-mindaugasb.c9.io/5c/c6/348e9a5b0e11fb6cd5948155c02cc65c - why is it important to use 16 processes when the naming scheme is HEX based (hexadecimal system)?

我无法理解这背后的逻辑；但我认为更多的是做分布和数据量。如果 find 的输出行总数是 16 的倍数，那么它可能 some 有意义。

Bash 中的 Xargs 并行度

Xargs parallelism in Bash

linux

bash

shell

xargs

command-line-arguments