Selenium 网络抓取:如何将一个选项卡优先于另一个选项卡
Selenium web scraping: how to prioritize a tab over another
项目:保存 https://theuselessweb.com/
中的所有 URLs/titles
测试代码(只有 3 页,打印不保存):
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
PATH = r"C:\Users\XXX\Documents\scraping\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://theuselessweb.com/")
driver.switch_to.window(driver.window_handles[-1])
button = driver.find_element_by_id("button")
for i in range(3):
button.click()
sleep(2)
driver.switch_to.window(driver.window_handles[-1])
print(driver.current_url)
print(driver.title)
driver.close()
错误:
DevTools listening on ws://127.0.0.1:60235/devtools/browser/a5ea4ab0-fba6-4a34-b0ee-8926876c554f
[11636:4168:0626/143411.535:ERROR:device_event_log_impl.cc(214)] [14:34:11.535] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
[11636:4168:0626/143411.552:ERROR:device_event_log_impl.cc(214)] [14:34:11.552] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
[11636:4168:0626/143411.555:ERROR:device_event_log_impl.cc(214)] [14:34:11.555] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
https://thatsthefinger.com/ #this is what I want
The finger, deal with it. #this is what I want
Traceback (most recent call last):
File "C:\Users\XXX\Documents\scraping\programs\linkscraping.py", line 16, in <module>
button.click()
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
(Session info: chrome=91.0.4472.124)
它打印出第一个网站的 URL 和标题,然后崩溃。同样,每次我 运行 driver.get(ANYURL)
命令时,它都会打开 link 和 Chrome 设置 (chrome://settings/triggeredResetProfileSettings)。也许这会把事情搞砸,无论如何,如果我也能摆脱这个不需要的 window,那将非常有帮助。
这是问题的解决方案。它仍会每隔 link 打开,但由于它是无头的,因此用户不可见。
在这种情况下,X 是您要提取的随机网站的数量
代码会打开站点,然后根据 x 单击所需次数的按钮,然后继续每一个并记录结果。最后,它关闭 Chrome.
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
options.headless = True
driver = webdriver.Chrome(
ChromeDriverManager().install(),
options=options
)
x = 10
driver.get('https://theuselessweb.com/')
button = button = driver.find_element_by_id("button")
for i in range(x):
button.click()
for i in range(x):
driver.switch_to.window(driver.window_handles[i+1])
print(driver.current_url)
print(driver.title)
driver.quit()
项目:保存 https://theuselessweb.com/
中的所有 URLs/titles测试代码(只有 3 页,打印不保存):
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
PATH = r"C:\Users\XXX\Documents\scraping\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://theuselessweb.com/")
driver.switch_to.window(driver.window_handles[-1])
button = driver.find_element_by_id("button")
for i in range(3):
button.click()
sleep(2)
driver.switch_to.window(driver.window_handles[-1])
print(driver.current_url)
print(driver.title)
driver.close()
错误:
DevTools listening on ws://127.0.0.1:60235/devtools/browser/a5ea4ab0-fba6-4a34-b0ee-8926876c554f
[11636:4168:0626/143411.535:ERROR:device_event_log_impl.cc(214)] [14:34:11.535] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
[11636:4168:0626/143411.552:ERROR:device_event_log_impl.cc(214)] [14:34:11.552] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
[11636:4168:0626/143411.555:ERROR:device_event_log_impl.cc(214)] [14:34:11.555] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
https://thatsthefinger.com/ #this is what I want
The finger, deal with it. #this is what I want
Traceback (most recent call last):
File "C:\Users\XXX\Documents\scraping\programs\linkscraping.py", line 16, in <module>
button.click()
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
(Session info: chrome=91.0.4472.124)
它打印出第一个网站的 URL 和标题,然后崩溃。同样,每次我 运行 driver.get(ANYURL)
命令时,它都会打开 link 和 Chrome 设置 (chrome://settings/triggeredResetProfileSettings)。也许这会把事情搞砸,无论如何,如果我也能摆脱这个不需要的 window,那将非常有帮助。
这是问题的解决方案。它仍会每隔 link 打开,但由于它是无头的,因此用户不可见。
在这种情况下,X 是您要提取的随机网站的数量
代码会打开站点,然后根据 x 单击所需次数的按钮,然后继续每一个并记录结果。最后,它关闭 Chrome.
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
options.headless = True
driver = webdriver.Chrome(
ChromeDriverManager().install(),
options=options
)
x = 10
driver.get('https://theuselessweb.com/')
button = button = driver.find_element_by_id("button")
for i in range(x):
button.click()
for i in range(x):
driver.switch_to.window(driver.window_handles[i+1])
print(driver.current_url)
print(driver.title)
driver.quit()