从流重定向中更正乱序打印
Correcting out of order printing from stream redirection
我有一个 python 脚本,它使用 multiprocessing.pool.map
来完成一些工作。随着它的进行,它会将内容打印到 stdout
,对于它打印到 stderr
的错误。我决定为每个流都有一个单独的日志文件会很好,经过一番思考我应该 运行 像这样:
time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)
这给了我日志文件并保留了适当流上的输出。然而问题来了。如果我 运行 它没有重定向我得到这个:
$ time ./ecisSearch.py 58Ni.conf 4 1
2015-01-09 14:42:37.524333: This job will perform 4 fit(s) //this is stdout
2015-01-09 14:42:37.524433: Threaded mapping of the fit function onto the starting point input set is commencing //this is stdout
2015-01-09 14:42:37.526641: Starting run #: 0 //this is stdout
2015-01-09 14:42:37.527018: Starting run #: 1 //this is stdout
2015-01-09 14:42:37.529124: Starting run #: 2 //this is stdout
2015-01-09 14:42:37.529831: Starting run #: 3 //this is stdout
2015-01-09 14:42:54.052522: Test of std err writing in run 0 is finished //this is stderr
2015-01-09 14:42:54.502284: Test of std err writing in run 1 is finished //this is stderr
2015-01-09 14:42:59.952433: Test of std err writing in run 3 is finished //this is stderr
2015-01-09 14:43:03.259783: Test of std err writing in run 2 is finished //this is stderr
2015-01-09 14:43:03.260360: Finished fits in job #: 1 preparing to output data to file //this is stdout
2015-01-09 14:43:03.275472: Job finished //this is stdout
real 0m26.001s
user 0m44.145s
sys 0m32.626s
但是,运行将其与重定向结合会生成以下输出。
$ time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)
2015-01-09 14:55:13.188230: Test of std err writing in run 0 is finished //this is stderr
2015-01-09 14:55:13.855079: Test of std err writing in run 1 is finished //this is stderr
2015-01-09 14:55:19.526580: Test of std err writing in run 3 is finished //this is stderr
2015-01-09 14:55:23.628807: Test of std err writing in run 2 is finished //this is stderr
2015-01-09 14:54:56.534790: Starting run #: 0 //this is stdout
2015-01-09 14:54:56.535162: Starting run #: 1 //this is stdout
2015-01-09 14:54:56.538952: Starting run #: 3 //this is stdout
2015-01-09 14:54:56.563677: Starting run #: 2 //this is stdout
2015-01-09 14:54:56.531837: This job will perform 4 fit(s) //this is stdout
2015-01-09 14:54:56.531912: Threaded mapping of the fit function onto the starting point input set is commencing //this is stdout
2015-01-09 14:55:23.629427: Finished fits in job #: 1 preparing to output data to file //this is stdout
2015-01-09 14:55:23.629742: Job finished //this is stdout
real 0m27.376s
user 0m44.661s
sys 0m33.295s
只要查看时间戳,我们就会发现这里发生了一些奇怪的事情。 stderr
和 stdout
流不仅没有像它们应该的那样相互穿插,而且 stdout
组件似乎首先有来自子流程的东西,然后是来自 'master' 进程,不管它出现的顺序如何。我知道 stderr
是无缓冲的,而 stdout
是缓冲的,但这并不能解释为什么 stdout
信息在它自己的流中是乱序的。 另外,从我的帖子中看不出,所有 stdout
等到执行结束才出现在屏幕上的事实。
我的问题如下:为什么会这样? 更重要的是 有没有办法解决它?
stdout 的输出是缓冲的:也就是说,print 语句实际上写入一个缓冲区,这个缓冲区只是偶尔刷新到终端。每个进程都有一个单独的缓冲区,这就是为什么来自不同进程的写入会出现乱序的原因(这是一个常见问题,如 Why subprocess stdout to a file is written out of order?)
在这种情况下,输出是有序的,但在重定向时出现乱序。为什么? This article 说明:
- stdin is always buffered
- stderr is always unbuffered
- if stdout is a terminal then buffering is automatically set to line buffered, else it is set to buffered
因此,当输出到终端时,它会刷新每一行,并且恰好按顺序出现。重定向时,使用长缓冲区(通常为 4096 字节)。因为你打印的少于那个,所以先完成的子进程先被刷新。
解决方案是使用 flush()
,或者完全禁用进程缓冲(参见 Disable output buffering)
我有一个 python 脚本,它使用 multiprocessing.pool.map
来完成一些工作。随着它的进行,它会将内容打印到 stdout
,对于它打印到 stderr
的错误。我决定为每个流都有一个单独的日志文件会很好,经过一番思考我应该 运行 像这样:
time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)
这给了我日志文件并保留了适当流上的输出。然而问题来了。如果我 运行 它没有重定向我得到这个:
$ time ./ecisSearch.py 58Ni.conf 4 1
2015-01-09 14:42:37.524333: This job will perform 4 fit(s) //this is stdout
2015-01-09 14:42:37.524433: Threaded mapping of the fit function onto the starting point input set is commencing //this is stdout
2015-01-09 14:42:37.526641: Starting run #: 0 //this is stdout
2015-01-09 14:42:37.527018: Starting run #: 1 //this is stdout
2015-01-09 14:42:37.529124: Starting run #: 2 //this is stdout
2015-01-09 14:42:37.529831: Starting run #: 3 //this is stdout
2015-01-09 14:42:54.052522: Test of std err writing in run 0 is finished //this is stderr
2015-01-09 14:42:54.502284: Test of std err writing in run 1 is finished //this is stderr
2015-01-09 14:42:59.952433: Test of std err writing in run 3 is finished //this is stderr
2015-01-09 14:43:03.259783: Test of std err writing in run 2 is finished //this is stderr
2015-01-09 14:43:03.260360: Finished fits in job #: 1 preparing to output data to file //this is stdout
2015-01-09 14:43:03.275472: Job finished //this is stdout
real 0m26.001s
user 0m44.145s
sys 0m32.626s
但是,运行将其与重定向结合会生成以下输出。
$ time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)
2015-01-09 14:55:13.188230: Test of std err writing in run 0 is finished //this is stderr
2015-01-09 14:55:13.855079: Test of std err writing in run 1 is finished //this is stderr
2015-01-09 14:55:19.526580: Test of std err writing in run 3 is finished //this is stderr
2015-01-09 14:55:23.628807: Test of std err writing in run 2 is finished //this is stderr
2015-01-09 14:54:56.534790: Starting run #: 0 //this is stdout
2015-01-09 14:54:56.535162: Starting run #: 1 //this is stdout
2015-01-09 14:54:56.538952: Starting run #: 3 //this is stdout
2015-01-09 14:54:56.563677: Starting run #: 2 //this is stdout
2015-01-09 14:54:56.531837: This job will perform 4 fit(s) //this is stdout
2015-01-09 14:54:56.531912: Threaded mapping of the fit function onto the starting point input set is commencing //this is stdout
2015-01-09 14:55:23.629427: Finished fits in job #: 1 preparing to output data to file //this is stdout
2015-01-09 14:55:23.629742: Job finished //this is stdout
real 0m27.376s
user 0m44.661s
sys 0m33.295s
只要查看时间戳,我们就会发现这里发生了一些奇怪的事情。 stderr
和 stdout
流不仅没有像它们应该的那样相互穿插,而且 stdout
组件似乎首先有来自子流程的东西,然后是来自 'master' 进程,不管它出现的顺序如何。我知道 stderr
是无缓冲的,而 stdout
是缓冲的,但这并不能解释为什么 stdout
信息在它自己的流中是乱序的。 另外,从我的帖子中看不出,所有 stdout
等到执行结束才出现在屏幕上的事实。
我的问题如下:为什么会这样? 更重要的是 有没有办法解决它?
stdout 的输出是缓冲的:也就是说,print 语句实际上写入一个缓冲区,这个缓冲区只是偶尔刷新到终端。每个进程都有一个单独的缓冲区,这就是为什么来自不同进程的写入会出现乱序的原因(这是一个常见问题,如 Why subprocess stdout to a file is written out of order?)
在这种情况下,输出是有序的,但在重定向时出现乱序。为什么? This article 说明:
- stdin is always buffered
- stderr is always unbuffered
- if stdout is a terminal then buffering is automatically set to line buffered, else it is set to buffered
因此,当输出到终端时,它会刷新每一行,并且恰好按顺序出现。重定向时,使用长缓冲区(通常为 4096 字节)。因为你打印的少于那个,所以先完成的子进程先被刷新。
解决方案是使用 flush()
,或者完全禁用进程缓冲(参见 Disable output buffering)