如何使用 BeautifulSoup 收集数据 python
How to collect data python using BeautifulSoup
我正在尝试使用 beautifulsoup 通过 python 收集数据,但它正在收集除电子邮件数据之外的所有数据,因此我如何也可以收集电子邮件。
def scrapeProfileData(profilePageSource):
time.sleep(6)
try:
personName = str(profilePageSource.find("title").get_text().encode("utf-8"))[2:-1]
except:
personName =""
try:
industry = str(profilePageSource.find("dd", class_="industry").get_text().encode("utf-8"))[2:-1]
except:
industry = ""
try:
location = str(profilePageSource.find("span", class_="locality").get_text().encode("utf-8"))[2:-1]
except:
location = ""
try:
title = str(profilePageSource.find("p", class_="title").get_text().encode("utf-8"))[2:-1]
except:
title = ""
try:
email = str(profilePageSource.find("@", class_="contact-field").get_text().encode("utf-8"))[2:-1]
except:
email = ""
pass
这是 table 我正在尝试收集数据
dd class="industry"><a href="/vsearch/p?f_I=43&trk=prof-0-ovw-industry" name="industry" title="Find other members in this industry">Financial Services</a></dd>
<span class="locality"><a href="/vsearch/p?f_G=gb%3A4573&trk=prof-0-ovw-location" name='location' title="Find other members in London, Greater London, United Kingdom">London, Greater London, United Kingdom</a></span>
<p class="title">✔✔Sales & Business Development Mobile Payments, Telecoms, Cloud✔✔</p>
<table summary="Online Contact Info"><tr><th>Email</th><td><div id="email"><div id="email-view"><ul><li><a href="mailto:username@domain.com">username@domain.com</a></li></ul></div>
我也在考虑收集电子邮件,但是否需要建议我如何才能..
谢谢
您可以使用以下 CSS selector
获取电子邮件元素:
div#email-view a[href]
并且,在 Python 代码中:
email = profilePageSource.select("div#email-view a[href]")[0].get_text()
或者,或者,不使用 CSS 选择器,使用 find()
:
email = profilePageSource.find("div", id="email-view").a.get_text()
我正在尝试使用 beautifulsoup 通过 python 收集数据,但它正在收集除电子邮件数据之外的所有数据,因此我如何也可以收集电子邮件。
def scrapeProfileData(profilePageSource):
time.sleep(6)
try:
personName = str(profilePageSource.find("title").get_text().encode("utf-8"))[2:-1]
except:
personName =""
try:
industry = str(profilePageSource.find("dd", class_="industry").get_text().encode("utf-8"))[2:-1]
except:
industry = ""
try:
location = str(profilePageSource.find("span", class_="locality").get_text().encode("utf-8"))[2:-1]
except:
location = ""
try:
title = str(profilePageSource.find("p", class_="title").get_text().encode("utf-8"))[2:-1]
except:
title = ""
try:
email = str(profilePageSource.find("@", class_="contact-field").get_text().encode("utf-8"))[2:-1]
except:
email = ""
pass
这是 table 我正在尝试收集数据
dd class="industry"><a href="/vsearch/p?f_I=43&trk=prof-0-ovw-industry" name="industry" title="Find other members in this industry">Financial Services</a></dd>
<span class="locality"><a href="/vsearch/p?f_G=gb%3A4573&trk=prof-0-ovw-location" name='location' title="Find other members in London, Greater London, United Kingdom">London, Greater London, United Kingdom</a></span>
<p class="title">✔✔Sales & Business Development Mobile Payments, Telecoms, Cloud✔✔</p>
<table summary="Online Contact Info"><tr><th>Email</th><td><div id="email"><div id="email-view"><ul><li><a href="mailto:username@domain.com">username@domain.com</a></li></ul></div>
我也在考虑收集电子邮件,但是否需要建议我如何才能..
谢谢
您可以使用以下 CSS selector
获取电子邮件元素:
div#email-view a[href]
并且,在 Python 代码中:
email = profilePageSource.select("div#email-view a[href]")[0].get_text()
或者,或者,不使用 CSS 选择器,使用 find()
:
email = profilePageSource.find("div", id="email-view").a.get_text()