如何使用 Selenium 和 Python 处理 try 循环中的错误
How to handle the error out of a try loop using Selenium and Python
我想 运行 使用 selenium 进行搜索,然后单击 DDG 搜索末尾的“更多结果”按钮。
DDG 搜索在显示查询的所有结果时不再显示该按钮。
我想在没有按钮的情况下退出 try 循环。
我将分享我现在正在尝试的内容。我之前也尝试过这两个选项:If len(button_element) > 0: button_element.click()
我试过 If button_element is not None: button_element.click()
.
我想要使用 Selenium 的解决方案,以便它显示浏览器,因为它有助于调试
这是我的代码,带有可重现的示例:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
browser = webdriver.Chrome()
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)
html = browser.page_source
try:
button_element = browser.find_element_by_class_name('result--more__btn')
try:
button_element.click()
except SystemExit:
print("No more pages")
except:
pass
使用 WebDriverWait 等待直到有更多按钮
wait = WebDriverWait(browser, 15) # 15 seconds timeout
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))
此示例代码点击更多按钮,直到不再有更多按钮为止
对于 chrome 将 firefox 替换为 chrome
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
browser = webdriver.Firefox()
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)
while True:
try:
wait = WebDriverWait(browser, 15) # 15 seconds timeout
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))
button_element = browser.find_element_by_class_name('result--more__btn')
button_element.click()
except:
break
您可以在 URL https://duckduckgo.com/html/?q=
上使用纯 HTML 版本的 DDG。这样你就可以使用纯 requests
/beautifulsoup
方法轻松获取所有页面:
import requests
from bs4 import BeautifulSoup
q = '"centre of intelligence"'
url = 'https://duckduckgo.com/html/?q={q}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
soup = BeautifulSoup(requests.get(url.format(q=q), headers=headers).content, 'html.parser')
while True:
for t, a, s in zip(soup.select('.result__title'), soup.select('.result__a'), soup.select('.result__snippet')):
print(t.get_text(strip=True, separator=' '))
print(a['href'])
print(s.get_text(strip=True, separator=' '))
print('-' * 80)
f = soup.select_one('.nav-link form')
if not f:
break
data = {}
for i in f.select('input'):
if i['type']=='submit':
continue
data[i['name']] = i.get('value', '')
soup = BeautifulSoup(requests.post('https://duckduckgo.com' + f['action'], data=data, headers=headers).content, 'html.parser')
打印:
Centre Of Intelligence - Home | Facebook
https://www.facebook.com/Centre-Of-Intelligence-937637846300833/
Centre Of Intelligence . 73 likes. Non-profit organisation. Facebook is showing information to help you better understand the purpose of a Page.
--------------------------------------------------------------------------------
centre of intelligence | English examples in context | Ludwig
https://ludwig.guru/s/centre+of+intelligence
(Glasgow was "the centre of the intelligence of England" according to the Grand Duke Alexis, who attended the launch of his father Tsar Alexander II's steam yacht there in 1880).
--------------------------------------------------------------------------------
Chinese scientists who studied bats in Aus at centre of intelligence ...
https://www.youtube.com/watch?v=UhcFXXzf2hc
Intelligence agencies are looking into two Chinese scientists in a bid to learn the true origin of COVID-19. Two Chinese scientists who studied live bats in...
--------------------------------------------------------------------------------
... and so on.
单击duckduckgo search results using you have to induce for the element_to_be_clickable()
and you can use either of the following 末尾的更多结果按钮:
代码块:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://duckduckgo.com/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("this is a search" + Keys.RETURN)
while True:
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.result--more__btn"))).click()
print("Clicked on More Results button")
except TimeoutException:
print("No more More Results button")
break
driver.quit()
控制台输出:
Clicked on More Results button
Clicked on More Results button
Clicked on More Results button
Clicked on More Results button
Clicked on More Results button
No more More Results button
You can find a relevant discussion in How to extract the text from the search results of duckduckgo using Selenium Python
我想 运行 使用 selenium 进行搜索,然后单击 DDG 搜索末尾的“更多结果”按钮。
DDG 搜索在显示查询的所有结果时不再显示该按钮。
我想在没有按钮的情况下退出 try 循环。
我将分享我现在正在尝试的内容。我之前也尝试过这两个选项:If len(button_element) > 0: button_element.click()
我试过 If button_element is not None: button_element.click()
.
我想要使用 Selenium 的解决方案,以便它显示浏览器,因为它有助于调试
这是我的代码,带有可重现的示例:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
browser = webdriver.Chrome()
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)
html = browser.page_source
try:
button_element = browser.find_element_by_class_name('result--more__btn')
try:
button_element.click()
except SystemExit:
print("No more pages")
except:
pass
使用 WebDriverWait 等待直到有更多按钮
wait = WebDriverWait(browser, 15) # 15 seconds timeout
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))
此示例代码点击更多按钮,直到不再有更多按钮为止 对于 chrome 将 firefox 替换为 chrome
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
browser = webdriver.Firefox()
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)
while True:
try:
wait = WebDriverWait(browser, 15) # 15 seconds timeout
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))
button_element = browser.find_element_by_class_name('result--more__btn')
button_element.click()
except:
break
您可以在 URL https://duckduckgo.com/html/?q=
上使用纯 HTML 版本的 DDG。这样你就可以使用纯 requests
/beautifulsoup
方法轻松获取所有页面:
import requests
from bs4 import BeautifulSoup
q = '"centre of intelligence"'
url = 'https://duckduckgo.com/html/?q={q}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
soup = BeautifulSoup(requests.get(url.format(q=q), headers=headers).content, 'html.parser')
while True:
for t, a, s in zip(soup.select('.result__title'), soup.select('.result__a'), soup.select('.result__snippet')):
print(t.get_text(strip=True, separator=' '))
print(a['href'])
print(s.get_text(strip=True, separator=' '))
print('-' * 80)
f = soup.select_one('.nav-link form')
if not f:
break
data = {}
for i in f.select('input'):
if i['type']=='submit':
continue
data[i['name']] = i.get('value', '')
soup = BeautifulSoup(requests.post('https://duckduckgo.com' + f['action'], data=data, headers=headers).content, 'html.parser')
打印:
Centre Of Intelligence - Home | Facebook
https://www.facebook.com/Centre-Of-Intelligence-937637846300833/
Centre Of Intelligence . 73 likes. Non-profit organisation. Facebook is showing information to help you better understand the purpose of a Page.
--------------------------------------------------------------------------------
centre of intelligence | English examples in context | Ludwig
https://ludwig.guru/s/centre+of+intelligence
(Glasgow was "the centre of the intelligence of England" according to the Grand Duke Alexis, who attended the launch of his father Tsar Alexander II's steam yacht there in 1880).
--------------------------------------------------------------------------------
Chinese scientists who studied bats in Aus at centre of intelligence ...
https://www.youtube.com/watch?v=UhcFXXzf2hc
Intelligence agencies are looking into two Chinese scientists in a bid to learn the true origin of COVID-19. Two Chinese scientists who studied live bats in...
--------------------------------------------------------------------------------
... and so on.
单击duckduckgo search results using element_to_be_clickable()
and you can use either of the following
代码块:
from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.keys import Keys from selenium.common.exceptions import TimeoutException options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get('https://duckduckgo.com/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("this is a search" + Keys.RETURN) while True: try: WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.result--more__btn"))).click() print("Clicked on More Results button") except TimeoutException: print("No more More Results button") break driver.quit()
控制台输出:
Clicked on More Results button Clicked on More Results button Clicked on More Results button Clicked on More Results button Clicked on More Results button No more More Results button
You can find a relevant discussion in How to extract the text from the search results of duckduckgo using Selenium Python