如何将汉字和英文字符同时写入一个文件(Python3)?

How to write both Chinese characters and English characters into a file (Python 3)?

我写了一个脚本来抓取 YouTube 播放列表页面的标题

根据打印语句,一切正常,直到我尝试将标题写入文本文件,此时我得到 "UnicodeEncodeError: 'charmap' codec can't encode characters in position..."

我尝试在打开文件时添加 "encoding='utf8'",虽然这修复了错误,但所有汉字都被随机的乱码替换

我也尝试用 'replace' 对输出字符串进行编码,然后对其进行解码,但这也只是将所有特殊字符替换为问号

这是我的代码:

from bs4 import BeautifulSoup as BS
import urllib.request
import re

playlist_url = input("gib nem: ")

with urllib.request.urlopen(playlist_url) as response:
  playlist = response.read().decode('utf-8')
  soup = BS(playlist, "lxml")

title_attrs = soup.find_all(attrs={"data-title":re.compile(r".*")})
titles = [tag["data-title"] for tag in title_attrs]

titles_str = '\n'.join(titles)#.encode('cp1252','replace').decode('cp1252')

print(titles_str)
with open("playListNames.txt", "a") as f:
    f.write(titles_str)

下面是我用来测试的示例播放列表: https://www.youtube.com/playlist?list=PL3oW2tjiIxvSk0WKXaEiDY78KKbKghOOo

documentation明确文件编码:

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

回答您上一条评论中的问题。

  1. 您可以通过

    找出Windows上的首选编码
    import locale
    locale.getpreferredencoding()
    

如果 playListNames.txt 是用 open('playListNames.txt', 'w') 创建的,则 locale.getpreferredencoding() 返回的值用于编码。

如果文件是手动创建的,则编码取决于编辑器的 default/preferred 编码。

  1. 参考How to convert a file to utf-8 in Python? or How do I convert an ANSI encoded file to UTF-8 with Notepad++? [closed].

使用编码可以解决您的问题。 Windows 默认为 ANSI 编码,在美国 Windows 上是 Windows-1252。它不支持中文。您应该使用 utf8utf-8-sig 作为编码。一些 Windows 编辑更喜欢后者,否则采用 ANSI。

with open('playListNames.txt','w',encoding='utf-8-sig') as f: