如何使用 requests.get() 仅在 <body> 标签内获取文本？

Question

我试过使用：

from json import loads
from requests import get

text_inside_body_tag = loads(get('https://Clicker-leaderboard.ge1g.repl.co').content)

但是它要么给出有关使用字节对象加载的错误，要么当我删除 'loads' 时，它 returns 整个 html 代码，而我只想要标签中的代码。

谁能帮帮我？

Answer 1

BeautifulSoup (bs4) 是处理 HTML 数据的好模块。

# Import required modules
from bs4 import BeautifulSoup as bs4
import json
import requests

# Retrieve page content
html = requests.get("your url").content
# Create BS4 object to handle HTML data
soup = bs4(html, "lxml")

# Extract text from body tag and remove \n, \s and \t
body = soup.find("body").text.strip()
# Create dictionary from extracted data
data = json.loads(body)

如何使用 requests.get() 仅在 <body> 标签内获取文本？

How do you get the text inside only the <body> tag using requests.get()?

html

python

python-requests