没有从 BeautifulSoup 获得准确的文本

Question

我正在尝试从 HTML 页面获取准确的文本，但输出文本与预期文本不同。

text on HTML page

Салнас 14

text show by beautifulSoup

ĐĄĐ°ĐťĐ˝Đ°Ń 14

我的密码是

page = BeautifulSoup(url.read(),'html.parser')
page.find(id='tdo_11').text

Html inspect code for the div

<td class="ads_opt" id="tdo_11" nowrap=""><b>Салнас 14</b></td>

我不明白是什么原因造成的？我应该使用不同的解析器吗？

Answer 1

使用 requests 库来发出 HTTP 请求，它比 Python 内置的要好得多，原因有很多。它会自动智能地处理编码。

import requests
response = requests.get('https://www.ss.lv/msg/ru/real-estate/flats/riga/plyavnieki/onlol.html')
page = BeautifulSoup(response.text, 'html.parser')

没有从 BeautifulSoup 获得准确的文本

Not getting exact text from BeautifulSoup

python

beautifulsoup

html-parser