使用 python 从具有子方向的 HTML 检索数据
Retrieving data from HTML having the child direction using python
我正在尝试从 http://www.comuni-italiani.it/110/index.html
获取来自城市的电子邮件
我有使用 xPath Finder 的特定子方向 /html/body/span[3]/table[2]/tbody/tr[1]/td[2]/table/tbody/tr[11]/td/b/a
。现在我正在尝试从此页面检索电子邮件,但我对 BeatifulSoup
库知之甚少(我才刚刚开始)。在阅读了几篇指南后,我设法编写了以下代码,但我没有成功地正确指示子路由
from bs4 import BeautifulSoup
import requests
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
# call get method to request that page
page = requests.get(sample_web_page)
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
for i in child_soup.children:
print("child : ", i)
我做错了什么??
请在下面找到我解决您的问题的尝试。它的启动方式与您的代码相同,只是有点神奇,可以找到电子邮件并将其打印出来。
from bs4 import BeautifulSoup
import requests
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])
我正在尝试从 http://www.comuni-italiani.it/110/index.html
获取来自城市的电子邮件我有使用 xPath Finder 的特定子方向 /html/body/span[3]/table[2]/tbody/tr[1]/td[2]/table/tbody/tr[11]/td/b/a
。现在我正在尝试从此页面检索电子邮件,但我对 BeatifulSoup
库知之甚少(我才刚刚开始)。在阅读了几篇指南后,我设法编写了以下代码,但我没有成功地正确指示子路由
from bs4 import BeautifulSoup
import requests
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
# call get method to request that page
page = requests.get(sample_web_page)
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
for i in child_soup.children:
print("child : ", i)
我做错了什么??
请在下面找到我解决您的问题的尝试。它的启动方式与您的代码相同,只是有点神奇,可以找到电子邮件并将其打印出来。
from bs4 import BeautifulSoup
import requests
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])