Selenium/BeautifulSoup Python 中的 Webscraper 一直有 UnicodeEncodeError

Question

所以我有一个 webscraper 运行并且对于某些页面我的代码工作正常，但对于其他页面（必须包含特殊字符）当我将页面写入文件时它不会当我遇到可怕的 UnicodeEncodeError 时就这样做。我尝试了很多解决方案，包括 UnicodeDammit 以及使用 .encode('utf-8', 'ignore') 方法，所有真正的程序员都鄙视读取其他线程的方法，因为它只会抛出数据。问题是，我仍然不知道如何修复我的代码。菜鸟程序员的乐趣啊！那么各位大师对如何解决这个问题有一些想法吗？

有问题的代码在这里（假设我已经导入了必要的东西并定义了变量，因为我有）。

LBfull = browser2.page_source
LBfullsoup = BeautifulSoup(LBfull, 'html.parser', from_encoding='UTF-8')


LBfileready = str(LBfullsoup.prettify())
unicodedata.normalize('NFKD', LBfileready).encode('utf-8','ignore')
file = open('D:/PATH/'+date+citynames[i]+'LB.txt', 'w')
file.write(LBfileready)
file.close()

可怕的追溯在这里：

回溯（最近调用最后）：

File "fitbitloop.py", line 95, in <module>
    file.write(LBfileready)
  File "C:\python351\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1209190-
1209191: character maps to <undefined>

似乎无论我做了什么我都无法摆脱这个错误。是否有某种错误检查代码可以用来排除映射到 .我正在开发的网站是全球性的，所以不可否认会有各种特殊字符。由于我无法写入文件，因此无法查找相关字符。当我从字符串中请求它时，它在 python shell 中出现空白，我认为这是因为我的小命令提示符 window 也无法显示它。那么我该如何解决这个令人不快的问题呢？再次非常感谢任何帮助。或者，如果您能指出解决问题的线索，我也将不胜感激。关于这个特定主题的主题太多，很难找到 "right answer."

Answer 1

用'wb'属性写入文件让我避免了上面提到的错误。 HT 亚当范普罗延。感谢您的帮助！

Selenium/BeautifulSoup Python 中的 Webscraper 一直有 UnicodeEncodeError

Selenium/BeautifulSoup Webscraper in Python Keeps Having UnicodeEncodeError

python

selenium

beautifulsoup

web-scraping

python-unicode