if 函数中的字符串过滤在 Python 中不起作用

Question

我正在编写一个网络抓取工具，它可以一个接一个地从链接列表中抓取数据。问题是该网站同时对最多 3 个不同的按钮使用相同的 class 名称，而没有使用其他唯一标识符，据我所知，如果有更多按钮，则无法指向确切的按钮。

我使用了 driver.find.element 效果很好，因为它只找到了第一个结果并且基本上忽略了其他按钮。但是，在某些页面上，我试图抓取的优惠信息信息丢失，这导致脚本获取错误的数据并填充它，即使我对该数据根本不感兴趣。

所以我提出了一个解决方案，检查抓取的信息是否包含一个特定的字符串，该字符串只出现在我试图获取的那条信息中，如果找不到该字符串，数据变量应该被覆盖使用空数据，这样很明显信息不存在。

但是，在此过程中，我试图用来过滤字符串的 if 语句似乎根本不起作用。当网页上没有按钮时，它确实设法用空数据填充变量。但是，一旦出现一个不同的按钮，它就不会被过滤并以某种方式通过并破坏整个事情。

这是一个示例网页，根本不包含数据：

https://reality.idnes.cz/rk/detail/nido-group-s-r-o/5a85b108a26e3a2adb4e394c/?page=185

这是一个示例网页，其中包含 2 个带有数据的按钮，我试图抓取第一个按钮，在蓝色按钮中查找“nemovitostí”文本，这就是我要过滤的内容。

https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/

这是有问题的代码：

# Offers
        offers = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
        offers = offers.text
        print(offers)
        # Check if scraped information contains offers else move on
        if "nemovitostí" or "nemovitosti" or "nemovitost" in offers:
            pass
        else:
            offers = ""

因为 if 语句应该查找字符串集，否则如果没有找到应该执行 else 语句下的任何其他代码，我似乎无法理解数据是如何进入的。没有错误代码或警告它只是拾取数据而不是忽略它，即使字符串不同也是如此。

以上代码供参考:

# Open links.csv file and read it's contents
with open('links.csv') as read:
    reader = csv.reader(read)
    link_list = list(reader)
    # Information search
    for link in link_list:
        driver.get(', '.join(link))
        # Title
        title = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
        # Offers
        offers = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
        offers = offers.text
        print(offers)
        # Check if scraped information contains offers else move on
        if "nemovitostí" or "nemovitosti" or "nemovitost" in offers:
            None
        else:
            offers = ""
        # Address
        address = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
        # Phone number
        # Try to obtain phone number if nonexistent move on
        try:
            phone_number = wait.until(ec.presence_of_element_located((By.XPATH, "//a[./span[contains(@class, 'icon icon--phone')]]")))
            phone_number = phone_number.text
        except TimeoutException:
            phone_number = ""
        # Email
        # Try to obtain email if nonexistent move on
        try:
            email = wait.until(ec.presence_of_element_located((By.XPATH, "//a[./span[contains(@class, 'icon icon--email')]]")))
            email = email.text
        except TimeoutException:
            email = ""
        # Print scraping results
        print(title.text, " ", offers, " ", address.text, " ", phone_number, " ", email)
        # Save results to a list
        company = [title.text, offers, address.text, phone_number, email]
        # Write results to scraped.xlsx file
        worksheet.write_row(row, 0, company)
        del title, offers, address, phone_number, email
        # Push row number lower
        row += 1
    workbook.close()
    driver.quit()

数据怎么可能还通？我的语法有错误吗？如果您看到我的错误，请告诉我，以便下次我能做得更好！感谢任何人的帮助！

Answer 1

1. 问题是该网站同时对最多 3 个不同的按钮使用相同的 class 名称，据我所知没有使用其他唯一标识符如果有更多

，则无法指向确切的按钮

如果您使用 By.XPATH 而不是 By.CSS_SELECTOR，您实际上可以获得您需要的元素。第一个是 (//span[@class='btn__text'])[1]，第二个是 (//span[@class='btn__text'])[2]，第三个是 (//span[@class='btn__text'])[3] 或者，如果您不确定顺序是什么，您可以更具体一些，例如 (//span[@class='btn__text' and contains(text(),'nemovitostí')])

2. 第二个问题与 python

中的 if 语法有关

应该是这样的

if "nemovitostí" in offers or "nemovitosti"  in offers or "nemovitost" in offers:

可能有更好的写法，可能是这样的：

for i in ["nemovitostí" , "nemovitosti" , "nemovitost"]:
    if i in offers:

Answer 2

最理想的写法如下

value=["nemovitostí","nemovitosti","nemovitost"]
if any(s in offers for s in value):
   #dosomethinghere
else:
   offers = ""

if 函数中的字符串过滤在 Python 中不起作用

String filtering in an if function not working in Python

python

selenium

python-3.x

selenium-webdriver

geckodriver