如何使用 BeautifulSoup 替换 ```td``` 标签内的 ```img``` 标签?
How to replace an ```img``` tag inside a ```td``` tag using BeautifulSoup?
我正在尝试使用 python3.8.7
和 BeautifulSoup4.9.3
解析一个充满表格的网页,以便我可以在电报频道上显示它。我可以从网页上获取所有必要的表格,但在这些表格的深处有 td
标签,其中包含 img
标签,其中 src
需要用 [=19] 替换=] 标签。这些是到目前为止的代码:
import pickle
import bs4 as bs
v_file = open('data/pickled_data/pickled_v', 'rb')
v_pickled = pickle.load(v_file)
v_soup = bs.BeautifulSoup(v_pickled.content, "html5lib")
all_tbls = v_soup.find_all('table')
我试过更换图片——又名 star_image
——如下所示,但它 returns AttributeError: 'NoneType' object has no attribute 'replace_with'
:
url_2_check = "https://i.imgur.com/ffIvqVj.png"
for table in all_tbls:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for star_image in td.find_all('img'):
if star_image['src'] == url_2_check:
p_tag = v_soup.new_tag('p')
p_tag.string = ":star:"
td.star_image.replace_with(p_tag)
然后我试了下,但是returns ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree
:
for table in all_tbls:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for star_image in td.find_all('img'):
if star_image['src'] == url_2_check:
p_tag = v_soup.new_tag('p')
p_tag.string = ":star:"
td.replace_with(p_tag)
我似乎无法弄清楚我做错了什么,有人可以帮忙吗?
谢谢。
要解析来自 table 的数据,您可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = "https://xxviptips.blogspot.com/"
soup = BeautifulSoup(requests.get(url).content, "lxml")
for t in soup.select("table"):
rows = t.select("tr")
league = rows[0].get_text(strip=True)
match, info1, info2 = [
td.get_text(strip=True) for td in rows[1].select("td")
]
rate_star = " ".join([":star:"] * len(rows[2].select("img")))
print(
"{:<35} {:<35} {:<10} {:<10} {:<50}".format(
league, match, info1, info2, rate_star
)
)
打印:
Eng. Premier League - 19:00 GMT Fulham - Burnley under 3.5 1.30 :star: :star: :star: :star:
Spanish Liga Primera- 19:00 GMT Betis - Granada CF over 1.5 1.30 :star: :star: :star:
Spanish Liga Segunda- 19:00 GMT Gijon - Lugo under 2.5 1.44 :star: :star: :star:
Spanish Liga Segunda- 17:00 GMT Rayo Vallecano - Leganes DC - 1/X 1.30 :star: :star: :star:
German Bundesliga 2- 16:00 GMT Holstein Kiel - Hannover DC - 1/X 1.25 :star: :star: :star: :star:
German Bundesliga 2- 18:30 GMT Hamburger SV - Nurnberg under 4.5 1.22 :star: :star: :star: :star:
Italian Serie B - 12:00 GMT Pescara - Salernitana away win 1.27 :star: :star: :star:
Italian Serie B- 12:00 GMT Empoli - Lecce DC - 1/X 1.31 :star: :star: :star: :star:
Romanian Liga 1- 18:30 GMT FCSB - FC Clinceni home win 1.27 :star: :star: :star: :star:
Romanian Liga 1- 13:45 GMT FC Voluntari - Chindia Targoviste under 2.5 1.38 :star: :star: :star:
Portuguese Prim. Liga- 19:15 GMT Porto - Sporting Farense home win 1.33 :star: :star: :star: :star:
Portuguese Prim. Liga - 17:00 GMT Portimonense - Moreirense under 3.5 1.25 :star: :star: :star: :star:
我正在尝试使用 python3.8.7
和 BeautifulSoup4.9.3
解析一个充满表格的网页,以便我可以在电报频道上显示它。我可以从网页上获取所有必要的表格,但在这些表格的深处有 td
标签,其中包含 img
标签,其中 src
需要用 [=19] 替换=] 标签。这些是到目前为止的代码:
import pickle
import bs4 as bs
v_file = open('data/pickled_data/pickled_v', 'rb')
v_pickled = pickle.load(v_file)
v_soup = bs.BeautifulSoup(v_pickled.content, "html5lib")
all_tbls = v_soup.find_all('table')
我试过更换图片——又名 star_image
——如下所示,但它 returns AttributeError: 'NoneType' object has no attribute 'replace_with'
:
url_2_check = "https://i.imgur.com/ffIvqVj.png"
for table in all_tbls:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for star_image in td.find_all('img'):
if star_image['src'] == url_2_check:
p_tag = v_soup.new_tag('p')
p_tag.string = ":star:"
td.star_image.replace_with(p_tag)
然后我试了下,但是returns ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree
:
for table in all_tbls:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for star_image in td.find_all('img'):
if star_image['src'] == url_2_check:
p_tag = v_soup.new_tag('p')
p_tag.string = ":star:"
td.replace_with(p_tag)
我似乎无法弄清楚我做错了什么,有人可以帮忙吗?
谢谢。
要解析来自 table 的数据,您可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = "https://xxviptips.blogspot.com/"
soup = BeautifulSoup(requests.get(url).content, "lxml")
for t in soup.select("table"):
rows = t.select("tr")
league = rows[0].get_text(strip=True)
match, info1, info2 = [
td.get_text(strip=True) for td in rows[1].select("td")
]
rate_star = " ".join([":star:"] * len(rows[2].select("img")))
print(
"{:<35} {:<35} {:<10} {:<10} {:<50}".format(
league, match, info1, info2, rate_star
)
)
打印:
Eng. Premier League - 19:00 GMT Fulham - Burnley under 3.5 1.30 :star: :star: :star: :star:
Spanish Liga Primera- 19:00 GMT Betis - Granada CF over 1.5 1.30 :star: :star: :star:
Spanish Liga Segunda- 19:00 GMT Gijon - Lugo under 2.5 1.44 :star: :star: :star:
Spanish Liga Segunda- 17:00 GMT Rayo Vallecano - Leganes DC - 1/X 1.30 :star: :star: :star:
German Bundesliga 2- 16:00 GMT Holstein Kiel - Hannover DC - 1/X 1.25 :star: :star: :star: :star:
German Bundesliga 2- 18:30 GMT Hamburger SV - Nurnberg under 4.5 1.22 :star: :star: :star: :star:
Italian Serie B - 12:00 GMT Pescara - Salernitana away win 1.27 :star: :star: :star:
Italian Serie B- 12:00 GMT Empoli - Lecce DC - 1/X 1.31 :star: :star: :star: :star:
Romanian Liga 1- 18:30 GMT FCSB - FC Clinceni home win 1.27 :star: :star: :star: :star:
Romanian Liga 1- 13:45 GMT FC Voluntari - Chindia Targoviste under 2.5 1.38 :star: :star: :star:
Portuguese Prim. Liga- 19:15 GMT Porto - Sporting Farense home win 1.33 :star: :star: :star: :star:
Portuguese Prim. Liga - 17:00 GMT Portimonense - Moreirense under 3.5 1.25 :star: :star: :star: :star: