为什么管道上的这个 strace 没有完成

Why does this strace on a pipeline not finish

我有一个包含单个文件的目录,one.txt。如果我 运行 ls | cat,它工作正常。但是,如果我尝试对该管道的两侧进行 strace,我会看到命令的输出以及 strace,但该过程并未完成。

strace ls 2> >(stdbuf -o 0 sed 's/^/command1:/') | strace cat 2> >(stdbuf -o 0 sed 's/^/command2:/')

我得到的输出是:

command2:execve("/usr/bin/cat", ["cat"], [/* 50 vars */]) = 0
command2:brk(0)                                  = 0x1938000
command2:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f87e5a93000
command2:access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
<snip>
command2:open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
command2:fstat(3, {st_mode=S_IFREG|0644, st_size=106070960, ...}) = 0
command2:mmap(NULL, 106070960, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f87def8a000
command2:close(3)                                = 0
command2:fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
command2:fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
command2:fadvise64(0, 0, 0, POSIX_FADV_SEQUENTIAL) = -1 ESPIPE (Illegal seek)
command2:read(0, "command1:execve(\"/usr/bin/ls\", ["..., 65536) = 4985
command1:execve("/usr/bin/ls", ["ls"], [/* 50 vars */]) = 0
command1:brk(0)                                  = 0x1190000
command1:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fae869c3000
command1:access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
<snip>
command1:close(3)                                = 0
command1:fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
command2:write(1, "command1:close(3)               "..., 115) = 115
command2:read(0, "command1:mmap(NULL, 4096, PROT_R"..., 65536) = 160
command1:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fae869c2000
one.txt
command1:write(1, "one.txt\n", 8)                = 8
command2:write(1, "command1:mmap(NULL, 4096, PROT_R"..., 160) = 160
command2:read(0, "command1:close(1)               "..., 65536) = 159
command1:close(1)                                = 0
command1:munmap(0x7fae869c2000, 4096)            = 0
command1:close(2)                                = 0
command2:write(1, "command1:close(1)               "..., 159) = 159
command2:read(0, "command1:exit_group(0)          "..., 65536) = 53
command1:exit_group(0)                           = ?
command2:write(1, "command1:exit_group(0)          "..., 53) = 53
command2:read(0, "command1:+++ exited with 0 +++\n", 65536) = 31
command1:+++ exited with 0 +++
command2:write(1, "command1:+++ exited with 0 +++\n", 31) = 31

从那以后就挂了。 ps 表明管道中的两个命令(这里是 ls 和 cat)都是 运行ning.

我在 RHEL7 运行ning Bash 版本 4.2.46.

我在你的 strace 上加了一个 strace:

strace bash -c 'strace true 2> >(cat > /dev/null)'

它在 wait4 上挂起,表明它卡在等待 children 上。 ps f 证实了这一点:

24740 pts/19   Ss     0:00 /bin/bash
24752 pts/19   S+     0:00  \_ strace true
24753 pts/19   S+     0:00      \_ /bin/bash
24755 pts/19   S+     0:00          \_ cat

基于此,我的工作理论是这种效果是死锁,因为:

  1. strace 等待所有 children,即使是那些它没有直接生成的
  2. Bash 将进程替换生成为进程的 child。由于进程替换附加到 stderr,它实际上等待 parent 退出。

这表明至少有两种解决方法,这两种方法似乎都有效:

strace -D ls 2> >(nl)

{ strace ls; true; } 2> >(nl)

-D,引用手册页,“[运行] 作为一个分离的 grandchild 的跟踪程序进程,而不是 parent 的跟踪进程” .第二个强制 bash 执行另一个 fork 到 运行 strace,方法是在之后添加另一个命令。

在这两种情况下,额外的分支意味着进程替换不会以 strace 的 child 结束,从而避免了这个问题。