Python 3 在 pdb "b main" 失败并出现 UnicodeDecodeError?
Python 3 fails at pdb "b main" with UnicodeDecodeError?
我发现的唯一与此类似的问题是 Django UnicodeDecodeError when using pdb - 不幸的是,那里的解决方案不适用于这种情况。
考虑以下代码,test.py
:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# encoding: utf-8
def subtract(ina, inb):
myresult = ina - inb
return myresult
def main():
y2 = 10
y1 = 7
# calculate (y₂-y₁)
print("Calculating difference between y2: {} and y1: {}".format(y2, y1))
result = subtract(y2, y1)
print("The result is: {}".format(result))
if __name__ == '__main__':
main()
在 Windows 10 上使用来自 Anaconda3 的 Python3:
(base) C:\tmp>conda --version
conda 4.7.12
(base) C:\tmp>python --version
Python 3.7.3
...我可以 运行 这个程序没有问题:
(base) C:\tmp>python test.py
Calculating difference between y2: 10 and y1: 7
The result is: 3
但是,如果我想 debug/step 使用 pdb
通过这个程序,只要我键入 b main
在 main
函数上设置断点,它就会失败:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 648, in do_break
lineno = int(arg)
ValueError: invalid literal for int() with base 10: 'main'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 659, in do_break
code = func.__code__
AttributeError: 'str' object has no attribute '__code__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1701, in main
pdb._runscript(mainpyfile)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1570, in _runscript
self.run(statement)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 585, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 88, in trace_dispatch
return self.dispatch_line(frame)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 112, in dispatch_line
self.user_line(frame)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 261, in user_line
self.interaction(frame, None)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 352, in interaction
self._cmdloop()
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 321, in _cmdloop
self.cmdloop()
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 138, in cmdloop
stop = self.onecmd(line)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 418, in onecmd
return cmd.Cmd.onecmd(self, line)
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 217, in onecmd
return func(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 667, in do_break
(ok, filename, ln) = self.lineinfo(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 740, in lineinfo
answer = find_function(item, fname)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 100, in find_function
for lineno, line in enumerate(fp, start=1):
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 199: character maps to <undefined>
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> c:\programdata\anaconda3\lib\encodings\cp1252.py(23)decode()
-> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
(Pdb) q
Post mortem debugger finished. The test.py will be restarted
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) q
(base) C:\tmp>
问题出在注释行:# calculate (y₂-y₁)
;如果删除,则 pdb
开始正常:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Breakpoint 1 at c:\tmp\test.py:10
(Pdb) q
(base) C:\tmp>
我对此感到有点惊讶 - Python3 不应该是 "utf-8 by default" 吗?
显然,这是一个微不足道的案例,我可以轻松删除导致问题的单个注释行。但是,我有一个很大的脚本,其中到处都是 utf-8 字符,无论是在评论中还是在印刷品中我实际上都想逐步完成,并且进入并手动更改所有这些字符并不可行实例转换为 UTF-8 字符。
所以,有没有办法欺骗 Python3 的 pdb
,所以它有效 - 即使源代码中存在 utf-8 字符(无论是否在注释中,或者在实际命令中)?
Python 3 默认是 UTF-8,但它运行的环境不是 - 它的默认编码是 cp1252。
您可以将 PYTHONIOENCODING environment variable to UTF-8 to override the default encoding, or change the environment 设置为使用 UTF-8。
编辑
我分析得太仓促了。上述解决方案适用于修复从 stdin/stdout 读取或写入时引发的 unicode 错误,但这里的问题是 pdb opens a file 用于读取而未指定编码:
def find_function(funcname, filename):
cre = re.compile(r'def\s+%s\s*[(]' % re.escape(funcname))
try:
fp = open(filename)
except OSError:
return None
如果没有指定编码,根据 io docs Python will default to using the result of locale.getpreferredencoding - 在这种情况下可能是 cp1252。
一个解决方案可能是在 运行 调试器之前设置控制台语言环境。
也可以将 PYTHONUTF8 环境变量设置为 1
。除其他外,这将导致
open(), io.open(), and codecs.open() use the UTF-8 encoding by default.
我发现的唯一与此类似的问题是 Django UnicodeDecodeError when using pdb - 不幸的是,那里的解决方案不适用于这种情况。
考虑以下代码,test.py
:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# encoding: utf-8
def subtract(ina, inb):
myresult = ina - inb
return myresult
def main():
y2 = 10
y1 = 7
# calculate (y₂-y₁)
print("Calculating difference between y2: {} and y1: {}".format(y2, y1))
result = subtract(y2, y1)
print("The result is: {}".format(result))
if __name__ == '__main__':
main()
在 Windows 10 上使用来自 Anaconda3 的 Python3:
(base) C:\tmp>conda --version
conda 4.7.12
(base) C:\tmp>python --version
Python 3.7.3
...我可以 运行 这个程序没有问题:
(base) C:\tmp>python test.py
Calculating difference between y2: 10 and y1: 7
The result is: 3
但是,如果我想 debug/step 使用 pdb
通过这个程序,只要我键入 b main
在 main
函数上设置断点,它就会失败:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 648, in do_break
lineno = int(arg)
ValueError: invalid literal for int() with base 10: 'main'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 659, in do_break
code = func.__code__
AttributeError: 'str' object has no attribute '__code__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1701, in main
pdb._runscript(mainpyfile)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1570, in _runscript
self.run(statement)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 585, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 88, in trace_dispatch
return self.dispatch_line(frame)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 112, in dispatch_line
self.user_line(frame)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 261, in user_line
self.interaction(frame, None)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 352, in interaction
self._cmdloop()
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 321, in _cmdloop
self.cmdloop()
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 138, in cmdloop
stop = self.onecmd(line)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 418, in onecmd
return cmd.Cmd.onecmd(self, line)
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 217, in onecmd
return func(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 667, in do_break
(ok, filename, ln) = self.lineinfo(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 740, in lineinfo
answer = find_function(item, fname)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 100, in find_function
for lineno, line in enumerate(fp, start=1):
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 199: character maps to <undefined>
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> c:\programdata\anaconda3\lib\encodings\cp1252.py(23)decode()
-> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
(Pdb) q
Post mortem debugger finished. The test.py will be restarted
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) q
(base) C:\tmp>
问题出在注释行:# calculate (y₂-y₁)
;如果删除,则 pdb
开始正常:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Breakpoint 1 at c:\tmp\test.py:10
(Pdb) q
(base) C:\tmp>
我对此感到有点惊讶 - Python3 不应该是 "utf-8 by default" 吗?
显然,这是一个微不足道的案例,我可以轻松删除导致问题的单个注释行。但是,我有一个很大的脚本,其中到处都是 utf-8 字符,无论是在评论中还是在印刷品中我实际上都想逐步完成,并且进入并手动更改所有这些字符并不可行实例转换为 UTF-8 字符。
所以,有没有办法欺骗 Python3 的 pdb
,所以它有效 - 即使源代码中存在 utf-8 字符(无论是否在注释中,或者在实际命令中)?
Python 3 默认是 UTF-8,但它运行的环境不是 - 它的默认编码是 cp1252。
您可以将 PYTHONIOENCODING environment variable to UTF-8 to override the default encoding, or change the environment 设置为使用 UTF-8。
编辑
我分析得太仓促了。上述解决方案适用于修复从 stdin/stdout 读取或写入时引发的 unicode 错误,但这里的问题是 pdb opens a file 用于读取而未指定编码:
def find_function(funcname, filename):
cre = re.compile(r'def\s+%s\s*[(]' % re.escape(funcname))
try:
fp = open(filename)
except OSError:
return None
如果没有指定编码,根据 io docs Python will default to using the result of locale.getpreferredencoding - 在这种情况下可能是 cp1252。
一个解决方案可能是在 运行 调试器之前设置控制台语言环境。
也可以将 PYTHONUTF8 环境变量设置为 1
。除其他外,这将导致
open(), io.open(), and codecs.open() use the UTF-8 encoding by default.