带有代理的 Selenium 不工作/错误的选项?

Selenium with proxy not working / wrong options?

我有以下输出 IP 地址和信息的工作测试解决方案 -

现在我想将它与我的 ScraperAPI 帐户一起用于其他代理。 但是当我取消注释这两行时 -

# PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'
# options.add_argument('--proxy-server=%s' % PROXY) 

解决方案不再有效 -

如何将我的代理与 selenium/该代码一起使用? (scraperAPI 建议使用 selenium-wire 模块,但我不喜欢它,因为它对其他工具的特定版本有一些依赖性——所以我想在没有它的情况下使用代理)

这可能吗?

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from sys import platform
import os, sys
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from dotenv import load_dotenv, find_dotenv

WAIT = 10

load_dotenv(find_dotenv()) 
SCRAPER_API = os.environ.get("SCRAPER_API")
# PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'

srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')                                 
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')  
options.add_argument(f'user-agent={userAgent}')     
# options.add_argument('--proxy-server=%s' % PROXY)     
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (service=srv, options=options)    
waitWebDriver = WebDriverWait (driver, 10)  

link = "https://whatismyipaddress.com/"
driver.get (link)     
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')     
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
  tmpSPAN = e.find_all("span")
  for e2 in tmpSPAN:
    print(e2.text)
print(tmpIP.text)

driver.quit()

有几件事你需要回顾一下:

  • 首先,好像有错别字。 get() 之间有一个 space 字符,这可能导致:

    IndexError: list index out of range
    
  • 不确定下一行的作用,因为我可以在没有该行的情况下执行。你可能喜欢评论它。

    from dotenv import load_dotenv, find_dotenv
    
    load_dotenv(find_dotenv())
    
  • 如果您想停止使用 SCRAPER_API,请同时注释掉以下行:

    SCRAPER_API = os.environ.get("SCRAPER_API")
    

进行这些小调整并优化您的代码:

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

WAIT = 10
srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')                                 
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')  
options.add_argument(f'user-agent={userAgent}')     
driver = webdriver.Chrome (service=srv, options=options)    
waitWebDriver = WebDriverWait (driver, 10)  

link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')     
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
    tmpSPAN = e.find_all("span")
    for e2 in tmpSPAN:
            print(e2.text)
print(tmpIP.text)
driver.quit()

控制台输出:

[WDM] -

[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 96.0.4664
[WDM] - Get LATEST driver version for 96.0.4664
[WDM] - Driver [C:\Users\Admin\.wdm\drivers\chromedriver\win32.0.4664.45\chromedriver.exe] found in cache
ISP:
Jio
City:
Pune
Region:
Maharashtra
Country:
India
123.12.234.23

保存的屏幕截图:


使用代理

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

WAIT = 10

load_dotenv(find_dotenv()) 
SCRAPER_API = os.environ.get("SCRAPER_API")
PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'

srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')                                 
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')  
options.add_argument(f'user-agent={userAgent}') 
options.add_argument('--proxy-server={}'.format(PROXY))
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (service=srv, options=options)    
waitWebDriver = WebDriverWait (driver, 10)  

link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')     
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
    tmpSPAN = e.find_all("span")
    for e2 in tmpSPAN:
            print(e2.text)
print(tmpIP.text)
driver.quit()

Note: print(f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001') and ensure the SCRAPER_API returns a result.


参考资料

您可以在以下位置找到一些相关的详细讨论:

  • How to rotate Selenium webrowser IP address