如何在使用 Python 进行网页抓取时修复西里尔字符

How to fix Cyrillic characters while web-scraping with Python

我正在使用 BeautifulSoup 抓取带有 python 的西里尔文网站,但我遇到了一些问题,每个词都显示如下:

СилÑановÑка Ðавкова во Ðази

我也尝试了其他一些西里尔文网站,但它们运行良好。

我的代码是这样的:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://').text

soup = BeautifulSoup(source, 'lxml')

print(soup.prettify())

我该如何解决?

requests 未能将其检测为 utf-8

from bs4 import BeautifulSoup
import requests

source = requests.get('https://time.mk/')  # don't convert to text just yet

# print(source.encoding)
# prints out ISO-8859-1

source.encoding = 'utf-8'  # override encoding manually

soup = BeautifulSoup(source.text, 'lxml')  # this will now decode utf-8 correctly