美丽的汤返回不需要的字符

Question

我正在使用 Beautiful Soup 抓取页面以获取某些运动员的身高：

req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
height = soup.find_all("strong")
height = height[2].contents
print height

不幸的是，这是返回的内容：

[u'6\'0"']

我也试过：

height = str(height[2].contents)

和

height = unicode(height[2].contents)

但我仍然得到 [u'6\'0"'] 结果。

我怎样才能只返回 6'0" 而没有多余的字符？感谢您的帮助！

Answer 1

那些不是 "extra characters"。 .contents returns a list，您选择的元素只有一个子元素，因此您将获得一个包含一个元素的列表。 Python 将列表打印为伪 Python 代码，因此您可以看到它是什么以及其中的内容。

也许你想要.string？

Answer 2

如果你只想要第三个 strong 标签你不需要找到每个人，你可以使用 css 选择器 nth-of-type，一旦你拥有您只需要调用 .text:

的元素

req = requests.get(url)
soup = BeautifulSoup(req.content, "html.parser")
height = soup.select_one("strong:nth-of-type(3)").text

print(height)

您还应该调用 .content，让请求处理编码。

Beautiful Soup Returning Unwanted Characters