正在抓取 Yelp 家餐厅地址
Scraping Yelp restaurants address
我试图从 Yelp 中提取餐馆的地址和邮政编码,但没有成功。我遇到的问题是我无法提取包含邮政编码的第二个标签。下面的代码
returns 下图中的地址而不是邮政编码包含 2 个线程标签,第一个包含地址,第二个包含邮政和城市。
from bs4 import BeautifulSoup
import requests
url = 'https://www.yelp.com/search?cflt=restaurants&find_loc=Montreal, QC'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
for item in soup.select('[class*=container]'):
try:
if item.find('h4'):
name = item.find('h4').get_text()
addr = item.find('address').get_text()
print(name)
print(addr)
print('------------------')
except Exception as e:
raise e
print('')
检查元素:
您可以尝试使用find_all
from bs4 import BeautifulSoup
import requests
url = 'https://www.yelp.com/search?cflt=restaurants&find_loc=Montreal, QC'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
for item in soup.select('[class*=container]'):
try:
if item.find('h4'):
name = item.find('h4').get_text()
print(name)
for addr in item.find_all('address'):
print (addr.text, addr.next_sibling.text)
except Exception as e:
raise e
print('')
我试图从 Yelp 中提取餐馆的地址和邮政编码,但没有成功。我遇到的问题是我无法提取包含邮政编码的第二个标签。下面的代码 returns 下图中的地址而不是邮政编码包含 2 个线程标签,第一个包含地址,第二个包含邮政和城市。
from bs4 import BeautifulSoup
import requests
url = 'https://www.yelp.com/search?cflt=restaurants&find_loc=Montreal, QC'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
for item in soup.select('[class*=container]'):
try:
if item.find('h4'):
name = item.find('h4').get_text()
addr = item.find('address').get_text()
print(name)
print(addr)
print('------------------')
except Exception as e:
raise e
print('')
检查元素:
您可以尝试使用find_all
from bs4 import BeautifulSoup
import requests
url = 'https://www.yelp.com/search?cflt=restaurants&find_loc=Montreal, QC'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
for item in soup.select('[class*=container]'):
try:
if item.find('h4'):
name = item.find('h4').get_text()
print(name)
for addr in item.find_all('address'):
print (addr.text, addr.next_sibling.text)
except Exception as e:
raise e
print('')