Find() ==> 如何提取属性="value"
Find() ==> how extract attribute="value"
我想提取属性值“705-419-1151”
<a href="javascript:void(0)" class="mlr__item__cta jsMlrMenu" title="Get the Phone Number" data-phone="705-419-1151">
from bs4 import BeautifulSoup
url='https://www.yellowpages.ca/search/si/2/hvac+services/Ontario+ON'
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')
articles = soup.find_all('div', class_ ='listing__content__wrapper')
for item in articles:
tel = item.find('li' , {'data-phone' : 'attr(data-phone)'}).get()
print(tel)
我该怎么做?
在处理数据时尽量集中注意力,select您的元素更具体,并始终在调用方法之前检查元素是否可用:
e.get('data-phone') if(e := item.select_one('[data-phone]')) else None
例子
此示例将结果存储在字典列表中,因此您可以轻松创建 DataFrame
并保存为特定格式。
import requests
import pandas as pd
from bs4 import BeautifulSoup
url='https://www.yellowpages.ca/search/si/2/hvac+services/Ontario+ON'
headers = {'user-agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' , 'Accept-Language': 'en-US, en;q=0.5'}
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')
articles = soup.find_all('div', class_ ='listing__content__wrapper')
data = []
for item in articles:
com = e.get_text(strip=True, separator='\n') if(e := item.select_one('[itemprop="name"]')) else None
add = e.text.strip() if(e := item.select_one('[itemprop="address"]')) else None
tel = e.get('data-phone') if(e := item.select_one('[data-phone]')) else None
data.append({
'com':com,
'add':add,
'tel':tel
})
#create a csv file with results
pd.DataFrame(data).to_csv('filename.csv', index=False)
数据输出
[{'com': '1\nCity Experts',
'add': '17 Raffia Ave, Richmond Hill, ON L4E 4M9',
'tel': '416-858-3051'},
{'com': '2\nAssociateair Mechanical Systems Ltd',
'add': '40-81 Auriga Dr, Nepean, ON K2E 7Y5',
'tel': '343-700-1174'},
{'com': '3\nAffordable Comfort Heating & Cooling',
'add': '54 Cedar Pointe Dr, Unit 1207 Suite 022, Barrie, ON L4N 5R7',
'tel': '705-300-9536'},
{'com': '4\nHenderson Metal Fabricating Co Ltd',
'add': '76 Industrial Park Cres, Sault Ste Marie, ON P6B 5P2',
'tel': '705-910-5895'},...]
我想提取属性值“705-419-1151”
<a href="javascript:void(0)" class="mlr__item__cta jsMlrMenu" title="Get the Phone Number" data-phone="705-419-1151">
from bs4 import BeautifulSoup
url='https://www.yellowpages.ca/search/si/2/hvac+services/Ontario+ON'
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')
articles = soup.find_all('div', class_ ='listing__content__wrapper')
for item in articles:
tel = item.find('li' , {'data-phone' : 'attr(data-phone)'}).get()
print(tel)
我该怎么做?
在处理数据时尽量集中注意力,select您的元素更具体,并始终在调用方法之前检查元素是否可用:
e.get('data-phone') if(e := item.select_one('[data-phone]')) else None
例子
此示例将结果存储在字典列表中,因此您可以轻松创建 DataFrame
并保存为特定格式。
import requests
import pandas as pd
from bs4 import BeautifulSoup
url='https://www.yellowpages.ca/search/si/2/hvac+services/Ontario+ON'
headers = {'user-agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' , 'Accept-Language': 'en-US, en;q=0.5'}
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')
articles = soup.find_all('div', class_ ='listing__content__wrapper')
data = []
for item in articles:
com = e.get_text(strip=True, separator='\n') if(e := item.select_one('[itemprop="name"]')) else None
add = e.text.strip() if(e := item.select_one('[itemprop="address"]')) else None
tel = e.get('data-phone') if(e := item.select_one('[data-phone]')) else None
data.append({
'com':com,
'add':add,
'tel':tel
})
#create a csv file with results
pd.DataFrame(data).to_csv('filename.csv', index=False)
数据输出
[{'com': '1\nCity Experts',
'add': '17 Raffia Ave, Richmond Hill, ON L4E 4M9',
'tel': '416-858-3051'},
{'com': '2\nAssociateair Mechanical Systems Ltd',
'add': '40-81 Auriga Dr, Nepean, ON K2E 7Y5',
'tel': '343-700-1174'},
{'com': '3\nAffordable Comfort Heating & Cooling',
'add': '54 Cedar Pointe Dr, Unit 1207 Suite 022, Barrie, ON L4N 5R7',
'tel': '705-300-9536'},
{'com': '4\nHenderson Metal Fabricating Co Ltd',
'add': '76 Industrial Park Cres, Sault Ste Marie, ON P6B 5P2',
'tel': '705-910-5895'},...]