Python 写入文件时出现 UnicodeEncodeError

Question

我正在使用 "pdfminer.six"，一个 python 库，从我拥有的几个 PDF 中提取所有文本。我的方法完美无缺，但是对于一些 pdf，可能有一些特殊字符，当我将它写入文本文件时，我得到 "Unicode Encode Error: 'charmap' codec can't encode character '\u03b2' in position 271130: character maps to "。现在，我知道 "is" 发生了什么，但我想知道如何以最好的方式对待它。这是让我头疼的部分：

    with open("newTxtFile.txt", "w") as textFile:
        textFile.write(text)

因为我来自巴西，文本是葡萄牙语，我想保留所有重音，所以我在 pdfminer 中使用 "codec = 'latin-1'"。据我所知，保存前打印一直到最后都完美无缺，但每当我尝试保存到文件时，我都会收到 UnicodeEncodeError。

我考虑过的两个选择是：要么我找到一种方法来只捕获给我带来麻烦的特定字符：

    with open("newTxtFile.txt", "w") as textFile:
    try:
        textFile.write(text)
    except UnicodeEncodeError:
        ????

但我不知道 except 应该是什么？

或者我应该以不同的方式保存到文件中。

谁能给我一些提示？非常感谢！

Answer 1

尝试：

with open("newTxtFile.txt", "wb") as textFile:
    textFile.write(text.encode('utf8'))

阅读：

with open("newTxtFile.txt", "rb") as textFile:
    text = textFile.read().decode('utf8')

Python 写入文件时出现 UnicodeEncodeError

Python UnicodeEncodeError when writing to file

python

unicode

file

pdfminer