如何使用 BeautifulSoup 在网页上查找特定的 class 元素
How to use BeautifulSoup to find specific class elements on a web page
目标:要执行查找业务的网络搜索,并从结果中查找“永久关闭”文本或“打开”的时间或基本上任何但“永久关闭”的文本。
问题:我正在使用 BeautifulSoup 来解析搜索结果,但它似乎只有 class 50% 的时间找到正确的元素。
import urllib as u
from bs4 import BeautifulSoup as bs
import time
from PIL import Image
from io import BytesIO, StringIO
comp = pandas.DataFrame(data=[['ALL CITY FITNESS 2', '1005 E PESCADERO AVE SITE 211', 'TRACY', 'CA', '', '']],
columns=['NAME','ADDRESS','CITY','STATE','VERIFIED','STATUS'])
for i in comp.index:
if comp.loc[i, 'VERIFIED'] != 'YES':
location, address, city, state = comp.loc[i, ['NAME', 'ADDRESS', 'CITY', 'STATE']]
print(location, address, city, state)
search_string = f'{location} {address} {city}, {state}'
# search_html = Str(search_string).htmlconvert() # This is a custom function
search_html = 'ALL%20CITY%20FITNESS%202%201005%20E%20PESCADERO%20AVE%20SITE%20211%20TRACY%2C%20CA'
url = f'https://www.bing.com/search?q={search_html}'
try:
req = u.request.urlopen(url)
soup = bs(req, "xml")
# This checks if there is a Permanently Closed indicator on the page
# This works pretty consistently
for item in soup.find_all(class_='b_alert'):
print(item.text)
# Mark Location as closed
comp.loc[i, 'STATUS'] = 'INACTIVE'
else:
# This however, and the one below it rarely work
for check in soup.find_all(class_='e_green b_positive'):
print(check.text)
for check in soup.find_all('span', class_='e_green b_positive'):
print(check.text)
comp.loc[i, 'VERIFIED'] = 'YES'
time.sleep(3)
except Exception as e:
errors.append([i, search_string, e])
print(comp)
我手动执行了此搜索并检查了元素,这是我检索此 class 名称的地方。我试过添加“。”所以它是 'e_green.b_positive' 并且也删除了它,如上所示。两者似乎都不起作用,或者至少 100% 的时间都不起作用。我的语法有什么地方漏掉了?
我不确定为什么这会影响它,但它实际上与您对 html 的编码方式有关,或者更确切地说,与您使用的 html 的最终格式有关运行 搜索。
将 '&qs=n&form=QBRE&=%25eManage%20Your%20Search%20History%25E&sp=-1&p'
添加到 url
变量的末尾,我敢打赌您的代码现在会找到那些 class 项。
目标:要执行查找业务的网络搜索,并从结果中查找“永久关闭”文本或“打开”的时间或基本上任何但“永久关闭”的文本。
问题:我正在使用 BeautifulSoup 来解析搜索结果,但它似乎只有 class 50% 的时间找到正确的元素。
import urllib as u
from bs4 import BeautifulSoup as bs
import time
from PIL import Image
from io import BytesIO, StringIO
comp = pandas.DataFrame(data=[['ALL CITY FITNESS 2', '1005 E PESCADERO AVE SITE 211', 'TRACY', 'CA', '', '']],
columns=['NAME','ADDRESS','CITY','STATE','VERIFIED','STATUS'])
for i in comp.index:
if comp.loc[i, 'VERIFIED'] != 'YES':
location, address, city, state = comp.loc[i, ['NAME', 'ADDRESS', 'CITY', 'STATE']]
print(location, address, city, state)
search_string = f'{location} {address} {city}, {state}'
# search_html = Str(search_string).htmlconvert() # This is a custom function
search_html = 'ALL%20CITY%20FITNESS%202%201005%20E%20PESCADERO%20AVE%20SITE%20211%20TRACY%2C%20CA'
url = f'https://www.bing.com/search?q={search_html}'
try:
req = u.request.urlopen(url)
soup = bs(req, "xml")
# This checks if there is a Permanently Closed indicator on the page
# This works pretty consistently
for item in soup.find_all(class_='b_alert'):
print(item.text)
# Mark Location as closed
comp.loc[i, 'STATUS'] = 'INACTIVE'
else:
# This however, and the one below it rarely work
for check in soup.find_all(class_='e_green b_positive'):
print(check.text)
for check in soup.find_all('span', class_='e_green b_positive'):
print(check.text)
comp.loc[i, 'VERIFIED'] = 'YES'
time.sleep(3)
except Exception as e:
errors.append([i, search_string, e])
print(comp)
我手动执行了此搜索并检查了元素,这是我检索此 class 名称的地方。我试过添加“。”所以它是 'e_green.b_positive' 并且也删除了它,如上所示。两者似乎都不起作用,或者至少 100% 的时间都不起作用。我的语法有什么地方漏掉了?
我不确定为什么这会影响它,但它实际上与您对 html 的编码方式有关,或者更确切地说,与您使用的 html 的最终格式有关运行 搜索。
将 '&qs=n&form=QBRE&=%25eManage%20Your%20Search%20History%25E&sp=-1&p'
添加到 url
变量的末尾,我敢打赌您的代码现在会找到那些 class 项。