为什么 UnicodeDecodeError 在添加 time.sleep(1) 之后才隐藏?
Why's UnicodeDecodeError hidden until after adding time.sleep(1)?
EDITED2:
在下面的 EDITED 代码中,f_out.write(bytearray(out or ""))
应该(两次)替换为:
f_out.write(bytearray((out or ""), 'utf8'))
# 在删除之前 universal_newlines=True
或者
f_out.write(out or "")
# 删除后 universal_newlines=True
msw、tdelaney 和 j-f-sebastian - 非常感谢您的帮助!
已编辑 - 因此,这是我的脚本的编辑版本,现在始终会触发 UnicodeDecodeError:
#!python3 # Run this script with Python 3.x (in Windows, assuming pylauncher is installed).
import subprocess
import sys
sys.stderr = sys.stdout = open('std.outerr', 'w')
# Redirected stdout/stderr so that they can be seen even when script is not run from command line.
child = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"], bufsize=0,
stdin=subprocess.DEVNULL, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, universal_newlines=True)
# `universal_newlines=True` TEMPORARILY left in to show that UnicodeDecodeError is triggered.
# `universal_newlines=True` WILL be removed from FINAL script.
try:
(out, _) = child.communicate(timeout=5)
# 1 second wasn't long enough for UnicodeDecodeError to consistently be triggered.
# Since subprocess's stderr was redirected to its stdout, 2nd element of tuple will be `None`.
except subprocess.TimeoutExpired:
child.kill()
(out, _) = child.communicate() # Try a 2nd time, without timeout.
with open('subprocess.out', 'wb') as f_out:
f_out.write(bytearray(out or "")) # Treat `None` as an empty string).
else:
print("\nERROR: A crash occurred before the timeout expired!\n")
with open('subprocess.out', 'wb') as f_out:
f_out.write(bytearray(out or ""))
编辑 - 现在(使用上面的脚本,减去 universal_newlines=True
),Evince 生成的 1.2MB,18,978 行标准错误被正确捕获:
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Kid object (page 1) is not an indirect reference (integer)
....................................................................
(Evince.exe:6800): GLib-GObject-CRITICAL **: g_object_unref:
assertion `G_IS_OBJECT (object)' failed
对于某些 fuzzing 我正在做的,下面的 subprocess.Popen()
调用:
import subprocess
proc = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
universal_newlines=True)
try:
proc.communicate(timeout=1) # Works the same with timeout=60 seconds.
except subprocess.TimeoutExpired: # This exception is new to Python 3.3.
proc.kill()
# Other code here.
else:
print("\nERROR: A crash occurred before the timeout expired!\n")
给我一个UnicodeDecodeError
:
Exception in thread Thread-1: Traceback (most recent call last):
File "p:\python35-64\lib\threading.py", line 914, in _bootstrap_inner self.run()
File "p:\python35-64\lib\threading.py", line 862, in run self._target(*self._args, **self._kwargs)
File "p:\python35-64\lib\subprocess.py", line 1279, in _readerthread buffer.append(fh.read())
File "p:\python35-64\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 291289: character maps to [undefined]
即使我将 "Other code" 简化为像 time.sleep(1)
这样简单的东西,也会发生这种情况。然而,当我删除 "Other code" 时,没有发生异常。
我现在意识到异常的发生是因为我在 Popen()
调用中不必要地指定了 universal_newlines=True
。 [这 不 与写入 stderr
且值大于 127(正在发生)的字节兼容。]
但是,因为只有在我的 proc.kill()
之后有一些 "Other code" 时才会发生异常,所以我的代码似乎还有其他地方可能不太正确。所以,我临时在我的代码中留下了universal_newlines=True
,并省略了我的"Other code",以便能够更好地确定那是什么。
我尝试更改 buf_size
并尝试 flush()
同时 stdout
和 stderr, but none of that seems to make any difference.
我在 Python docs 中看到:
Popen objects are supported as context managers via the with statement: on exit,
standard file descriptors are closed, and the process is waited for.
所以我尝试将我的 Popen()
调用替换为:
with subprocess.Popen(..., universal_newlines=True) as proc:
并生成了 UnicodeDecodeError
,即使没有 "Other code"。所以,这是 "fix" 我的代码的一种方式,但是(因为我需要做一些额外的事情),我理想情况下想使用第 3 方 PyPI
psutil 模块。而且,遗憾的是目前不支持上下文管理器。所以,如果可能的话,我想在没有 with ... as
.
的情况下编写代码
还有什么(除了 universal_newlines
的值)我可以在我的代码中更改为 "fix" 吗?
根据文档所说的“支持 'Popen' 对象作为上下文管理器”,我尝试添加:
if proc.stdout:
proc.stdout.close()
if proc.stderr:
proc.stderr.close()
if proc.stdin:
proc.stdin.close()
and/or proc.wait()
就在我的 proc.kill()
之前,但是 proc.kill
从未达到。
我应该做什么with ... as
?
提前致谢。
输出可能会被缓冲,因此即使子进程已经死了,文本也可以被解码。如果没有 time.sleep(1)
那么父进程可能会在解码遇到错误之前退出(I/O reader 由 .communicate()
启动的守护线程被杀死然后父进程退出)。
EDITED2:
在下面的 EDITED 代码中,f_out.write(bytearray(out or ""))
应该(两次)替换为:
f_out.write(bytearray((out or ""), 'utf8'))
# 在删除之前 universal_newlines=True
或者
f_out.write(out or "")
# 删除后 universal_newlines=True
msw、tdelaney 和 j-f-sebastian - 非常感谢您的帮助!
已编辑 - 因此,这是我的脚本的编辑版本,现在始终会触发 UnicodeDecodeError:
#!python3 # Run this script with Python 3.x (in Windows, assuming pylauncher is installed).
import subprocess
import sys
sys.stderr = sys.stdout = open('std.outerr', 'w')
# Redirected stdout/stderr so that they can be seen even when script is not run from command line.
child = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"], bufsize=0,
stdin=subprocess.DEVNULL, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, universal_newlines=True)
# `universal_newlines=True` TEMPORARILY left in to show that UnicodeDecodeError is triggered.
# `universal_newlines=True` WILL be removed from FINAL script.
try:
(out, _) = child.communicate(timeout=5)
# 1 second wasn't long enough for UnicodeDecodeError to consistently be triggered.
# Since subprocess's stderr was redirected to its stdout, 2nd element of tuple will be `None`.
except subprocess.TimeoutExpired:
child.kill()
(out, _) = child.communicate() # Try a 2nd time, without timeout.
with open('subprocess.out', 'wb') as f_out:
f_out.write(bytearray(out or "")) # Treat `None` as an empty string).
else:
print("\nERROR: A crash occurred before the timeout expired!\n")
with open('subprocess.out', 'wb') as f_out:
f_out.write(bytearray(out or ""))
编辑 - 现在(使用上面的脚本,减去 universal_newlines=True
),Evince 生成的 1.2MB,18,978 行标准错误被正确捕获:
Error: PDF file is damaged - attempting to reconstruct xref table... Error: Kid object (page 1) is not an indirect reference (integer)
....................................................................
(Evince.exe:6800): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed
对于某些 fuzzing 我正在做的,下面的 subprocess.Popen()
调用:
import subprocess
proc = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
universal_newlines=True)
try:
proc.communicate(timeout=1) # Works the same with timeout=60 seconds.
except subprocess.TimeoutExpired: # This exception is new to Python 3.3.
proc.kill()
# Other code here.
else:
print("\nERROR: A crash occurred before the timeout expired!\n")
给我一个UnicodeDecodeError
:
Exception in thread Thread-1: Traceback (most recent call last):
File "p:\python35-64\lib\threading.py", line 914, in _bootstrap_inner self.run()
File "p:\python35-64\lib\threading.py", line 862, in run self._target(*self._args, **self._kwargs)
File "p:\python35-64\lib\subprocess.py", line 1279, in _readerthread buffer.append(fh.read())
File "p:\python35-64\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 291289: character maps to [undefined]
即使我将 "Other code" 简化为像 time.sleep(1)
这样简单的东西,也会发生这种情况。然而,当我删除 "Other code" 时,没有发生异常。
我现在意识到异常的发生是因为我在 Popen()
调用中不必要地指定了 universal_newlines=True
。 [这 不 与写入 stderr
且值大于 127(正在发生)的字节兼容。]
但是,因为只有在我的 proc.kill()
之后有一些 "Other code" 时才会发生异常,所以我的代码似乎还有其他地方可能不太正确。所以,我临时在我的代码中留下了universal_newlines=True
,并省略了我的"Other code",以便能够更好地确定那是什么。
我尝试更改 buf_size
并尝试 flush()
同时 stdout
和 stderr, but none of that seems to make any difference.
我在 Python docs 中看到:
Popen objects are supported as context managers via the with statement: on exit, standard file descriptors are closed, and the process is waited for.
所以我尝试将我的 Popen()
调用替换为:
with subprocess.Popen(..., universal_newlines=True) as proc:
并生成了 UnicodeDecodeError
,即使没有 "Other code"。所以,这是 "fix" 我的代码的一种方式,但是(因为我需要做一些额外的事情),我理想情况下想使用第 3 方 PyPI
psutil 模块。而且,遗憾的是目前不支持上下文管理器。所以,如果可能的话,我想在没有 with ... as
.
还有什么(除了 universal_newlines
的值)我可以在我的代码中更改为 "fix" 吗?
根据文档所说的“支持 'Popen' 对象作为上下文管理器”,我尝试添加:
if proc.stdout:
proc.stdout.close()
if proc.stderr:
proc.stderr.close()
if proc.stdin:
proc.stdin.close()
and/or proc.wait()
就在我的 proc.kill()
之前,但是 proc.kill
从未达到。
我应该做什么with ... as
?
提前致谢。
输出可能会被缓冲,因此即使子进程已经死了,文本也可以被解码。如果没有 time.sleep(1)
那么父进程可能会在解码遇到错误之前退出(I/O reader 由 .communicate()
启动的守护线程被杀死然后父进程退出)。