使用 python 对多个标签进行网页抓取
Web scraping of multiple tags using python
您好,我正在使用 python 进行网页抓取。
这是我的代码:
from bs4 import BeautifulSoup
import requests
page = requests.get(
'https://www.indeed.com/viewjob?jk=78fc5cc6a9d2aaa3&q=developer&l=Hammond,+LA&tk=1g3udv32opki1801&from=web&advn=2300444857198541&adid=371529140&ad=-6NYlbfkN0C3HlOxE-u7vDWDmHVgHclVijSpnbvDTTioTnwCLVe0OEwH_1p9qQb-3snK62Gml60thtHyOlr-diC2sIty8supkOLuy2apQt4gi355WXBpDDHQbuCkuMyYIfjito5_MzRa3sg8VkVKd5pvUD9rUt1RWPXpPzu2chM4oyLuN4riMCIsCh8gpIyWcPu7RV4Xt1Zp8PdeRuChYB95XZ0TM5bOYVexvf3lCdm4d3RG2TNPX5iZvX0mlZBhUQ2kufKY6TKI_2UZvTMgDAYwVjtFnB0qxEJi9aMmmp2GHECMAyifjTOAZkTUQnyIjUK_mFI7R7siYE6sIQSqPTt0pfEfvT4U-dfQpsmzdA1D0ZYdO-igFhm2rrEIwalOqCYEFwd3_cTBVkXzQBiiVA%3D%3D&pub=4a1b367933fd867b19b072952f68dceb&vjs=3').text
soup = BeautifulSoup(page, 'lxml')
jobs = soup.find(
'div', class_='jobsearch-JobComponent-description icl-u-xs-mt--md')
job_desc = jobs.find('p').text.replace('', '')
print(f"job description:{job_desc}")
在上面的代码中,我使用了 BeautifulSoup.Also 我能够获得职位描述。但我的问题是我只得到一行句子,因为在 job discription
的 div
标签内有更多 p
标签,我只能打印第一个标签。
正如我在图片中所附的那样,我怎样才能将所有职位描述 div
作为段落?
[示例图像页面][1]
我也尝试使用 for 循环:
job_desc = jobs.find_all('p')
for desc in job_desc:
job_de = desc.find('p')
print(f"job description:{job_de}")
作为回应,我能够得到
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
试试这个:
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(page, 'lxml')
job_desc = soup.find(
'div',
class_='jobsearch-JobComponent-description icl-u-xs-mt--md'
).get_text()
print(f"job description:{job_desc}")
If you only want the human-readable text inside a document or tag, you can use the get_text() method. It returns all the text in a document or beneath a tag, as a single Unicode string:
A link 到美丽汤文档中的 get_text():https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text
您好,我正在使用 python 进行网页抓取。 这是我的代码:
from bs4 import BeautifulSoup
import requests
page = requests.get(
'https://www.indeed.com/viewjob?jk=78fc5cc6a9d2aaa3&q=developer&l=Hammond,+LA&tk=1g3udv32opki1801&from=web&advn=2300444857198541&adid=371529140&ad=-6NYlbfkN0C3HlOxE-u7vDWDmHVgHclVijSpnbvDTTioTnwCLVe0OEwH_1p9qQb-3snK62Gml60thtHyOlr-diC2sIty8supkOLuy2apQt4gi355WXBpDDHQbuCkuMyYIfjito5_MzRa3sg8VkVKd5pvUD9rUt1RWPXpPzu2chM4oyLuN4riMCIsCh8gpIyWcPu7RV4Xt1Zp8PdeRuChYB95XZ0TM5bOYVexvf3lCdm4d3RG2TNPX5iZvX0mlZBhUQ2kufKY6TKI_2UZvTMgDAYwVjtFnB0qxEJi9aMmmp2GHECMAyifjTOAZkTUQnyIjUK_mFI7R7siYE6sIQSqPTt0pfEfvT4U-dfQpsmzdA1D0ZYdO-igFhm2rrEIwalOqCYEFwd3_cTBVkXzQBiiVA%3D%3D&pub=4a1b367933fd867b19b072952f68dceb&vjs=3').text
soup = BeautifulSoup(page, 'lxml')
jobs = soup.find(
'div', class_='jobsearch-JobComponent-description icl-u-xs-mt--md')
job_desc = jobs.find('p').text.replace('', '')
print(f"job description:{job_desc}")
在上面的代码中,我使用了 BeautifulSoup.Also 我能够获得职位描述。但我的问题是我只得到一行句子,因为在 job discription
的 div
标签内有更多 p
标签,我只能打印第一个标签。
正如我在图片中所附的那样,我怎样才能将所有职位描述 div
作为段落?
[示例图像页面][1]
我也尝试使用 for 循环:
job_desc = jobs.find_all('p')
for desc in job_desc:
job_de = desc.find('p')
print(f"job description:{job_de}")
作为回应,我能够得到
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
job description:None
试试这个:
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(page, 'lxml')
job_desc = soup.find(
'div',
class_='jobsearch-JobComponent-description icl-u-xs-mt--md'
).get_text()
print(f"job description:{job_desc}")
If you only want the human-readable text inside a document or tag, you can use the get_text() method. It returns all the text in a document or beneath a tag, as a single Unicode string:
A link 到美丽汤文档中的 get_text():https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text