如何使用 Selenium 和 Python 处理 try 循环中的错误

How to handle the error out of a try loop using Selenium and Python

我想 运行 使用 selenium 进行搜索,然后单击 DDG 搜索末尾的“更多结果”按钮。

DDG 搜索在显示查询的所有结果时不再显示该按钮。

我想在没有按钮的情况下退出 try 循环。

我将分享我现在正在尝试的内容。我之前也尝试过这两个选项:If len(button_element) > 0: button_element.click() 我试过 If button_element is not None: button_element.click().

我想要使用 Selenium 的解决方案,以便它显示浏览器,因为它有助于调试

这是我的代码,带有可重现的示例:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup

    browser = webdriver.Chrome()        
    browser.get("https://duckduckgo.com/")
    search = browser.find_element_by_name('q')
    search.send_keys("this is a search" + Keys.RETURN)
    html = browser.page_source

    try:
        button_element = browser.find_element_by_class_name('result--more__btn')

        try:
            button_element.click()
        except SystemExit:
            print("No more pages")

    except:
        pass

使用 WebDriverWait 等待直到有更多按钮

wait = WebDriverWait(browser, 15) # 15 seconds timeout 
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

此示例代码点击更多按钮,直到不再有更多按钮为止 对于 chrome 将 firefox 替换为 chrome

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

browser = webdriver.Firefox()        
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)

while True:
    try:
        wait = WebDriverWait(browser, 15) # 15 seconds timeout
        wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

        button_element = browser.find_element_by_class_name('result--more__btn')
        button_element.click()
    except:
        break

您可以在 URL https://duckduckgo.com/html/?q= 上使用纯 HTML 版本的 DDG。这样你就可以使用纯 requests/beautifulsoup 方法轻松获取所有页面:

import requests
from bs4 import BeautifulSoup


q = '"centre of intelligence"'
url = 'https://duckduckgo.com/html/?q={q}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url.format(q=q), headers=headers).content, 'html.parser')

while True:
    for t, a, s in zip(soup.select('.result__title'), soup.select('.result__a'), soup.select('.result__snippet')):
        print(t.get_text(strip=True, separator=' '))
        print(a['href'])
        print(s.get_text(strip=True, separator=' '))
        print('-' * 80)

    f = soup.select_one('.nav-link form')
    if not f:
        break

    data = {}
    for i in f.select('input'):
        if i['type']=='submit':
            continue
        data[i['name']] = i.get('value', '')

    soup = BeautifulSoup(requests.post('https://duckduckgo.com' + f['action'], data=data, headers=headers).content, 'html.parser')

打印:

Centre Of Intelligence - Home | Facebook
https://www.facebook.com/Centre-Of-Intelligence-937637846300833/
Centre Of Intelligence . 73 likes. Non-profit organisation. Facebook is showing information to help you better understand the purpose of a Page.
--------------------------------------------------------------------------------
centre of intelligence | English examples in context | Ludwig
https://ludwig.guru/s/centre+of+intelligence
(Glasgow was "the centre of the intelligence of England" according to the Grand Duke Alexis, who attended the launch of his father Tsar Alexander II's steam yacht there in 1880).
--------------------------------------------------------------------------------
Chinese scientists who studied bats in Aus at centre of intelligence ...
https://www.youtube.com/watch?v=UhcFXXzf2hc
Intelligence agencies are looking into two Chinese scientists in a bid to learn the true origin of COVID-19. Two Chinese scientists who studied live bats in...
--------------------------------------------------------------------------------

... and so on.

单击 search results using you have to induce for the element_to_be_clickable() and you can use either of the following 末尾的更多结果按钮:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys
    from selenium.common.exceptions import TimeoutException
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://duckduckgo.com/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("this is a search" + Keys.RETURN)
    while True:
          try:
              WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.result--more__btn"))).click()
              print("Clicked on More Results button")
          except TimeoutException:
              print("No more More Results button")
              break
    driver.quit()
    
  • 控制台输出:

    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    No more More Results button
    

You can find a relevant discussion in How to extract the text from the search results of duckduckgo using Selenium Python