使用 python 从具有子方向的 HTML 检索数据

Question

我正在尝试从 http://www.comuni-italiani.it/110/index.html

获取来自城市的电子邮件

我有使用 xPath Finder 的特定子方向 /html/body/span[3]/table[2]/tbody/tr[1]/td[2]/table/tbody/tr[11]/td/b/a。现在我正在尝试从此页面检索电子邮件，但我对 BeatifulSoup 库知之甚少（我才刚刚开始）。在阅读了几篇指南后，我设法编写了以下代码，但我没有成功地正确指示子路由

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
  
for i in child_soup.children:
    print("child :  ", i)

我做错了什么？？

Answer 1

请在下面找到我解决您的问题的尝试。它的启动方式与您的代码相同，只是有点神奇，可以找到电子邮件并将其打印出来。

from bs4 import BeautifulSoup
import requests
  
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])

使用 python 从具有子方向的 HTML 检索数据

Retrieving data from HTML having the child direction using python

html

python

parsing