如何使用 BeautifulSoup 在网页上查找特定的 class 元素

Question

目标：要执行查找业务的网络搜索，并从结果中查找“永久关闭”文本或“打开”的时间或基本上任何但“永久关闭”的文本。

问题：我正在使用 BeautifulSoup 来解析搜索结果，但它似乎只有 class 50% 的时间找到正确的元素。

import urllib as u
from bs4 import BeautifulSoup as bs
import time
from PIL import Image
from io import BytesIO, StringIO

comp = pandas.DataFrame(data=[['ALL CITY FITNESS 2', '1005 E PESCADERO AVE SITE 211', 'TRACY', 'CA', '', '']], 
                        columns=['NAME','ADDRESS','CITY','STATE','VERIFIED','STATUS'])

for i in comp.index:
    if comp.loc[i, 'VERIFIED'] != 'YES':
        location, address, city, state = comp.loc[i, ['NAME', 'ADDRESS', 'CITY', 'STATE']]
        print(location, address, city, state)
        search_string = f'{location} {address} {city}, {state}'
        # search_html = Str(search_string).htmlconvert() # This is a custom function
        search_html = 'ALL%20CITY%20FITNESS%202%201005%20E%20PESCADERO%20AVE%20SITE%20211%20TRACY%2C%20CA'
        url = f'https://www.bing.com/search?q={search_html}'

        try:
            req = u.request.urlopen(url)
            soup = bs(req, "xml")
            
            # This checks if there is a Permanently Closed indicator on the page
            # This works pretty consistently
            for item in soup.find_all(class_='b_alert'):
                print(item.text)
                # Mark Location as closed
                comp.loc[i, 'STATUS'] = 'INACTIVE'
            else:
                # This however, and the one below it rarely work
                for check in soup.find_all(class_='e_green b_positive'):
                    print(check.text)

                for check in soup.find_all('span', class_='e_green b_positive'):
                    print(check.text)

            comp.loc[i, 'VERIFIED'] = 'YES'
            time.sleep(3)

        except Exception as e:
            errors.append([i, search_string, e])
print(comp)

我手动执行了此搜索并检查了元素，这是我检索此 class 名称的地方。我试过添加“。”所以它是 'e_green.b_positive' 并且也删除了它，如上所示。两者似乎都不起作用，或者至少 100% 的时间都不起作用。我的语法有什么地方漏掉了？

Answer 1

我不确定为什么这会影响它，但它实际上与您对 html 的编码方式有关，或者更确切地说，与您使用的 html 的最终格式有关运行搜索。

将 '&qs=n&form=QBRE&=%25eManage%20Your%20Search%20History%25E&sp=-1&p' 添加到 url 变量的末尾，我敢打赌您的代码现在会找到那些 class 项。

如何使用 BeautifulSoup 在网页上查找特定的 class 元素

How to use BeautifulSoup to find specific class elements on a web page

python

element

beautifulsoup

data-scrubbing