如何使用 requests.get() 仅在 <body> 标签内获取文本?
How do you get the text inside only the <body> tag using requests.get()?
我试过使用:
from json import loads
from requests import get
text_inside_body_tag = loads(get('https://Clicker-leaderboard.ge1g.repl.co').content)
但是它要么给出有关使用字节对象加载的错误,要么当我删除 'loads' 时,它 returns 整个 html 代码,而我只想要标签中的代码。
谁能帮帮我?
BeautifulSoup (bs4) 是处理 HTML 数据的好模块。
# Import required modules
from bs4 import BeautifulSoup as bs4
import json
import requests
# Retrieve page content
html = requests.get("your url").content
# Create BS4 object to handle HTML data
soup = bs4(html, "lxml")
# Extract text from body tag and remove \n, \s and \t
body = soup.find("body").text.strip()
# Create dictionary from extracted data
data = json.loads(body)
我试过使用:
from json import loads
from requests import get
text_inside_body_tag = loads(get('https://Clicker-leaderboard.ge1g.repl.co').content)
但是它要么给出有关使用字节对象加载的错误,要么当我删除 'loads' 时,它 returns 整个 html 代码,而我只想要标签中的代码。
谁能帮帮我?
BeautifulSoup (bs4) 是处理 HTML 数据的好模块。
# Import required modules
from bs4 import BeautifulSoup as bs4
import json
import requests
# Retrieve page content
html = requests.get("your url").content
# Create BS4 object to handle HTML data
soup = bs4(html, "lxml")
# Extract text from body tag and remove \n, \s and \t
body = soup.find("body").text.strip()
# Create dictionary from extracted data
data = json.loads(body)