Find() ==> 如何提取属性="value"

Find() ==> how extract attribute="value"

我想提取属性值“705-419-1151”

<a href="javascript:void(0)" class="mlr__item__cta jsMlrMenu" title="Get the Phone Number" data-phone="705-419-1151">

from bs4 import BeautifulSoup
url='https://www.yellowpages.ca/search/si/2/hvac+services/Ontario+ON'

r = requests.get(url, headers = headers)

soup = BeautifulSoup(r.content, 'html.parser')

articles = soup.find_all('div', class_ ='listing__content__wrapper')

for item in articles:


    tel = item.find('li' , {'data-phone' : 'attr(data-phone)'}).get()

    print(tel)

我该怎么做?

在处理数据时尽量集中注意力,select您的元素更具体,并始终在调用方法之前检查元素是否可用:

e.get('data-phone') if(e := item.select_one('[data-phone]')) else None

例子

此示例将结果存储在字典列表中,因此您可以轻松创建 DataFrame 并保存为特定格式。

import requests
import pandas as pd
from bs4 import BeautifulSoup

url='https://www.yellowpages.ca/search/si/2/hvac+services/Ontario+ON'

headers = {'user-agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' , 'Accept-Language': 'en-US, en;q=0.5'}

r = requests.get(url, headers = headers)

soup = BeautifulSoup(r.content, 'html.parser')

articles = soup.find_all('div', class_ ='listing__content__wrapper')

data = []

for item in articles:

    com = e.get_text(strip=True, separator='\n') if(e := item.select_one('[itemprop="name"]')) else None
    add = e.text.strip() if(e := item.select_one('[itemprop="address"]')) else None
    tel = e.get('data-phone') if(e := item.select_one('[data-phone]')) else None

    data.append({
        'com':com,
        'add':add,
        'tel':tel
    })
#create a csv file with results
pd.DataFrame(data).to_csv('filename.csv', index=False)

数据输出

[{'com': '1\nCity Experts',
  'add': '17 Raffia Ave, Richmond Hill, ON L4E 4M9',
  'tel': '416-858-3051'},
 {'com': '2\nAssociateair Mechanical Systems Ltd',
  'add': '40-81 Auriga Dr, Nepean, ON K2E 7Y5',
  'tel': '343-700-1174'},
 {'com': '3\nAffordable Comfort Heating & Cooling',
  'add': '54 Cedar Pointe Dr, Unit 1207 Suite 022, Barrie, ON L4N 5R7',
  'tel': '705-300-9536'},
 {'com': '4\nHenderson Metal Fabricating Co Ltd',
  'add': '76 Industrial Park Cres, Sault Ste Marie, ON P6B 5P2',
  'tel': '705-910-5895'},...]