在 BASH 中进行管道传输时，是否可以从右侧命令中获取左侧命令的 PID？

Question

问题

给定一个 BASH 管道：

./a.sh | ./b.sh

./a.sh的PID为10。

有没有办法从 ./b.sh 中找到 ./a.sh 的 PID？

即如果有，并且 ./b.sh 看起来像下面这样：

#!/bin/bash
...
echo $LEFT_PID
cat

那么 ./a.sh | ./b.sh 的输出将是：

10
... Followed by whatever else ./a.sh printed to stdout.

背景

我正在编写这个名为 cachepoint 的 bash 脚本，我可以将其放入管道中以加快速度。

例如cat big_data | sed 's/a/b/g' | uniq -c | cachepoint | sort -n

这是一个有目的的简单示例。

管道一开始可能运行很慢，但在随后的运行秒，它会更快，因为 cachepoint 开始工作。

我想象 cachepoint 的工作方式是，它会使用前几百行输入以及之前的命令列表，以便为先前缓存的数据形成哈希 ID，因此在随后的运行s 中尽早打破标准输入管道，转而打印缓存数据。缓存数据每隔一小时左右就会被删除。

即| cachepoint 剩下的所有内容将继续运行ning，在正常情况下可能会达到 1,000,000 行，但在后续执行 cachepoint 管道时，| cachepoint 剩下的所有内容可能会在之后退出100 行，而 cachepoint 只会打印它缓存的数百万行。对于管道源和管道内容的哈希值，我需要一种方法 cachepoint 来读取管道中之前的 PID。

我经常使用管道来探索数据集，我经常发现自己通过管道传输到临时文件，以便绕过多次重复相同的昂贵管道。这很乱，所以我想要 cachepoint.

Answer 1

此 Shellcheck-clean 代码应该适用于任何 Linux 系统上的 b.sh 程序：

#! /bin/bash

shopt -s extglob
shopt -s nullglob

left_pid=

# Get the identifier for the pipe connected to the standard input of this
# process (e.g. 'pipe:[10294010]')
input_pipe_id=$(readlink "/proc/self/fd/0")
if [[ $input_pipe_id != pipe:* ]]; then
    echo 'ERROR: standard input is not a pipe' >&2
    exit 1
fi

# Find the process that has standard output connected to the same pipe
for stdout_path in /proc/+([[:digit:]])/fd/1; do
    output_pipe_id=$(readlink -- "$stdout_path")
    if [[ $output_pipe_id == "$input_pipe_id" ]]; then
        procpid=${stdout_path%/fd/*}
        left_pid=${procpid#/proc/}
        break
    fi
done

if [[ -z $left_pid ]]; then
    echo "ERROR: Failed to set 'left_pid'" >&2
    exit 1
fi

echo "$left_pid"
cat

这取决于以下事实：在 Linux 上，对于 ID 为 PID 的进程，路径 /proc/PID/fd/0 看起来像是连接到进程标准输入的设备的符号链接，并且 /proc/PID/fd/1 看起来像是连接到进程标准输出的设备的符号链接。

在 BASH 中进行管道传输时，是否可以从右侧命令中获取左侧命令的 PID？

When piping in BASH, is it possible to get the PID of the left command from within the right command?

linux

bash

shell

pipe

问题

背景