为什么 UnicodeDecodeError 在添加 time.sleep(1) 之后才隐藏?

Why's UnicodeDecodeError hidden until after adding time.sleep(1)?

EDITED2:

在下面的 EDITED 代码中,f_out.write(bytearray(out or "")) 应该(两次)替换为:
f_out.write(bytearray((out or ""), 'utf8')) # 在删除之前 universal_newlines=True
或者
f_out.write(out or "")# 删除后 universal_newlines=True


msw、tdelaney 和 j-f-sebastian - 非常感谢您的帮助!

已编辑 - 因此,这是我的脚本的编辑版本,现在始终会触发 UnicodeDecodeError:

#!python3  # Run this script with Python 3.x (in Windows, assuming pylauncher is installed).
import subprocess
import sys

sys.stderr = sys.stdout = open('std.outerr', 'w')
# Redirected stdout/stderr so that they can be seen even when script is not run from command line.
child = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"], bufsize=0,
                         stdin=subprocess.DEVNULL, stdout=subprocess.PIPE,
                         stderr=subprocess.STDOUT, universal_newlines=True)
# `universal_newlines=True` TEMPORARILY left in to show that UnicodeDecodeError is triggered.
# `universal_newlines=True` WILL be removed from FINAL script.
try:
    (out, _) = child.communicate(timeout=5)
# 1 second wasn't long enough for UnicodeDecodeError to consistently be triggered.
# Since subprocess's stderr was redirected to its stdout, 2nd element of tuple will be `None`.
except subprocess.TimeoutExpired:
    child.kill()
    (out, _) = child.communicate()  # Try a 2nd time, without timeout.
    with open('subprocess.out', 'wb') as f_out:
        f_out.write(bytearray(out or ""))  # Treat `None` as an empty string).
else:
    print("\nERROR: A crash occurred before the timeout expired!\n")
    with open('subprocess.out', 'wb') as f_out:
        f_out.write(bytearray(out or ""))

编辑 - 现在(使用上面的脚本,减去 universal_newlines=True),Evince 生成的 1.2MB,18,978 行标准错误被正确捕获:

Error: PDF file is damaged - attempting to reconstruct xref table... Error: Kid object (page 1) is not an indirect reference (integer)

....................................................................
(Evince.exe:6800): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed

对于某些 fuzzing 我正在做的,下面的 subprocess.Popen() 调用:

import subprocess
proc = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"],
                         stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
                         universal_newlines=True)
try:
    proc.communicate(timeout=1)  # Works the same with timeout=60 seconds.
except subprocess.TimeoutExpired:  # This exception is new to Python 3.3.
    proc.kill()
    # Other code here.
else:
    print("\nERROR: A crash occurred before the timeout expired!\n")

给我一个UnicodeDecodeError:

Exception in thread Thread-1: Traceback (most recent call last):  
File "p:\python35-64\lib\threading.py", line 914, in _bootstrap_inner self.run()   
File "p:\python35-64\lib\threading.py", line 862, in run self._target(*self._args, **self._kwargs)   
File "p:\python35-64\lib\subprocess.py", line 1279, in _readerthread buffer.append(fh.read())
File "p:\python35-64\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 291289:  character maps to [undefined]

即使我将 "Other code" 简化为像 time.sleep(1) 这样简单的东西,也会发生这种情况。然而,当我删除 "Other code" 时,没有发生异常。

我现在意识到异常的发生是因为我在 Popen() 调用中不必要地指定了 universal_newlines=True。 [这 与写入 stderr 且值大于 127(正在发生)的字节兼容。]

但是,因为只有在我的 proc.kill() 之后有一些 "Other code" 时才会发生异常,所以我的代码似乎还有其他地方可能不太正确。所以,我临时在我的代码中留下了universal_newlines=True,并省略了我的"Other code",以便能够更好地确定那是什么。

我尝试更改 buf_size 并尝试 flush() 同时 stdoutstderr, but none of that seems to make any difference.

我在 Python docs 中看到:

Popen objects are supported as context managers via the with statement: on exit, standard file descriptors are closed, and the process is waited for.

所以我尝试将我的 Popen() 调用替换为:

with subprocess.Popen(..., universal_newlines=True) as proc:

并生成了 UnicodeDecodeError,即使没有 "Other code"。所以,这是 "fix" 我的代码的一种方式,但是(因为我需要做一些额外的事情),我理想情况下想使用第 3 方 PyPIpsutil 模块。而且,遗憾的是目前不支持上下文管理器。所以,如果可能的话,我想在没有 with ... as.

的情况下编写代码

还有什么(除了 universal_newlines 的值)我可以在我的代码中更改为 "fix" 吗?

根据文档所说的“支持 'Popen' 对象作为上下文管理器”,我尝试添加:

if proc.stdout:
    proc.stdout.close()
if proc.stderr:
    proc.stderr.close()
if proc.stdin:
    proc.stdin.close()

and/or proc.wait() 就在我的 proc.kill() 之前,但是 proc.kill 从未达到。

我应该做什么with ... as

提前致谢。

输出可能会被缓冲,因此即使子进程已经死了,文本也可以被解码。如果没有 time.sleep(1) 那么父进程可能会在解码遇到错误之前退出(I/O reader 由 .communicate() 启动的守护线程被杀死然后父进程退出)。