UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 386: character maps to <undefined>
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 386: character maps to <undefined>
我正在尝试使用 slate 库读取 pdf 文件,但出现此错误:
import slate
pdf = 'tabla9.pdf'
with open(pdf,encoding="utf-8") as f:
doc = slate.PDF(f)
for page in doc[:2]:
print(page)
完整错误:
File "C:\Users\user\libro5.py", line 7, in <module>
doc = slate.PDF(f)
File "C:\Python3\lib\slate\classes.py", line 52, in __init__
self.parser = PDFParser(file)
File "C:\Python3\lib\site-packages\pdfminer\pdfparser.py", line 646, in
__init__
PSStackParser.__init__(self, fp)
File "C:\Python3\lib\site-packages\pdfminer\psparser.py", line 189, in
__init__
PSBaseParser.__init__(self, fp)
File "C:\Python3\lib\site-packages\pdfminer\psparser.py", line 134, in
__init__
data = fp.read()
File "C:\Python3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 10:
invalid continuation byte
classes.py
,第 52 行:
class PDF(list):
def __init__(self, file, password='', just_text=1, check_extractable=True, char_margin=1.0, line_margin=0.1, word_margin=0.1):
self.parser = PDFParser(file)
pdfparser.py
,第 646 行:
def __init__(self, fp):
PSStackParser.__init__(self, fp)
psparser.py
,第 189 行:
class PSStackParser(PSBaseParser):
def __init__(self, fp):
PSBaseParser.__init__(self, fp)
psparser.py
,第 134 行:
class PSBaseParser:
"""Most basic PostScript parser that performs only tokenization.
"""
def __init__(self, fp):
data = fp.read()
文件"C:\Python3\lib\codecs.py",第322行,解码中
(结果,消耗)= self._buffer_decode(数据,self.errors,最终)
UnicodeDecodeError:'utf-8' 编解码器无法解码位置 10 中的字节 0xe2:无效的连续字节:
def decode(self, input, final=False):
# decode input (taking the buffer into account)
data = self.buffer + input
(result, consumed) = self._buffer_decode(data, self.errors, final)
我在 Windows 10 上使用 Python 3.7。
PDF文件是二进制的,不适合用带编码的文本模式打开。
尝试:
with open(pdf, "rb") as f:
我正在尝试使用 slate 库读取 pdf 文件,但出现此错误:
import slate
pdf = 'tabla9.pdf'
with open(pdf,encoding="utf-8") as f:
doc = slate.PDF(f)
for page in doc[:2]:
print(page)
完整错误:
File "C:\Users\user\libro5.py", line 7, in <module>
doc = slate.PDF(f)
File "C:\Python3\lib\slate\classes.py", line 52, in __init__
self.parser = PDFParser(file)
File "C:\Python3\lib\site-packages\pdfminer\pdfparser.py", line 646, in
__init__
PSStackParser.__init__(self, fp)
File "C:\Python3\lib\site-packages\pdfminer\psparser.py", line 189, in
__init__
PSBaseParser.__init__(self, fp)
File "C:\Python3\lib\site-packages\pdfminer\psparser.py", line 134, in
__init__
data = fp.read()
File "C:\Python3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 10:
invalid continuation byte
classes.py
,第 52 行:
class PDF(list):
def __init__(self, file, password='', just_text=1, check_extractable=True, char_margin=1.0, line_margin=0.1, word_margin=0.1):
self.parser = PDFParser(file)
pdfparser.py
,第 646 行:
def __init__(self, fp):
PSStackParser.__init__(self, fp)
psparser.py
,第 189 行:
class PSStackParser(PSBaseParser):
def __init__(self, fp):
PSBaseParser.__init__(self, fp)
psparser.py
,第 134 行:
class PSBaseParser:
"""Most basic PostScript parser that performs only tokenization.
"""
def __init__(self, fp):
data = fp.read()
文件"C:\Python3\lib\codecs.py",第322行,解码中 (结果,消耗)= self._buffer_decode(数据,self.errors,最终) UnicodeDecodeError:'utf-8' 编解码器无法解码位置 10 中的字节 0xe2:无效的连续字节:
def decode(self, input, final=False):
# decode input (taking the buffer into account)
data = self.buffer + input
(result, consumed) = self._buffer_decode(data, self.errors, final)
我在 Windows 10 上使用 Python 3.7。
PDF文件是二进制的,不适合用带编码的文本模式打开。
尝试:
with open(pdf, "rb") as f: