在 mac 上解析 XML 但在 PC 上工作时出现 UnicodeDecodeError
UnicodeDecodeError when parsing XML on mac but works on PC
当解析 XML
文件时:
from lxml import etree
with open('cortex_full.xml', 'r') as infile:
root = etree.parse(infile)
我得到下面的 UnicodeDecodeError
。不过,这只发生在我的 Mac 上 - 如果我在我的工作 PC 上使用相同的脚本解析相同的文件,一切正常。
File "/Users/Desktop/CPET/xml_test2.py", line 5, in <module>
root = etree.parse(infile)
File "src/lxml/lxml.etree.pyx", line 3442, in lxml.etree.parse (src/lxml/lxml.etree.c:81701)
File "src/lxml/parser.pxi", line 1832, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:118888)
File "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:119171)
File "src/lxml/parser.pxi", line 1747, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:117959)
File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:112686)
File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105881)
File "src/lxml/parser.pxi", line 702, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107548)
File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:12152)
File "src/lxml/parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src/lxml/lxml.etree.c:103210)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 783: ordinal not in range(128)
考虑到这里的线程数量,这似乎是很常见的情况,但是 none 的建议修复似乎适用于此实例。让它工作的任何想法?完整 XML
文件 here
发布对我有用的答案以供将来参考。
感谢@Burhan Khalid 的回答。
打开xml
文件时需要将编码设置为utf-8
。
with open('cortex_full.xml', 'r', encoding='utf-8') as infile:
当解析 XML
文件时:
from lxml import etree
with open('cortex_full.xml', 'r') as infile:
root = etree.parse(infile)
我得到下面的 UnicodeDecodeError
。不过,这只发生在我的 Mac 上 - 如果我在我的工作 PC 上使用相同的脚本解析相同的文件,一切正常。
File "/Users/Desktop/CPET/xml_test2.py", line 5, in <module>
root = etree.parse(infile)
File "src/lxml/lxml.etree.pyx", line 3442, in lxml.etree.parse (src/lxml/lxml.etree.c:81701)
File "src/lxml/parser.pxi", line 1832, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:118888)
File "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:119171)
File "src/lxml/parser.pxi", line 1747, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:117959)
File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:112686)
File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105881)
File "src/lxml/parser.pxi", line 702, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107548)
File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:12152)
File "src/lxml/parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src/lxml/lxml.etree.c:103210)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 783: ordinal not in range(128)
考虑到这里的线程数量,这似乎是很常见的情况,但是 none 的建议修复似乎适用于此实例。让它工作的任何想法?完整 XML
文件 here
发布对我有用的答案以供将来参考。 感谢@Burhan Khalid 的回答。
打开xml
文件时需要将编码设置为utf-8
。
with open('cortex_full.xml', 'r', encoding='utf-8') as infile: