无法使用请求连接到 Tor，而我使用硒做了同样的事情

Question

我在 python 中编写了两个脚本：一个使用 selenium，另一个使用 requests 连接到 http://check.torproject.org 使用 Tor 得到这段文字 恭喜。此浏览器配置为从那里使用 Tor，以确保我以正确的方式做事。

当我使用下面的脚本时，我能顺利地得到文本：

from selenium import webdriver
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=socks5://localhost:9050')
driver = webdriver.Chrome(chrome_options=options)

driver.get("http://check.torproject.org")
item = driver.find_element_by_css_selector("h1.not").text
print(item)

driver.quit()

但是，当我尝试使用 requests 执行相同操作时，出现错误 AttributeError: 'NoneType' object has no attribute 'text':

import requests
from bs4 import BeautifulSoup
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

with requests.Session() as s:
    s.proxies['http'] = 'socks5://localhost:9050'
    res = s.get("http://check.torproject.org")
    soup = BeautifulSoup(res.text,"lxml")
    item = soup.select_one("h1.not").text
    print(item)

如何使用 requests 从该站点获取相同的文本？

当我使用这个 print(soup.title.text) 时，我可以得到这个文本 Sorry. You are not using Tor.，它清楚地表明 requests 不是通过 Tor 生成的。

Answer 1

check.torproject.org 强制使用 HTTPS，因此当请求遵循重定向到 https://check.torproject.org 时，您不再使用 SOCKS 代理，因为它仅指定用于 http 协议。

确保为 HTTP 和 HTTPS 设置代理。此外，要通过 Tor 解析 DNS 名称而不泄漏 DNS 请求，请使用 socks5h.

s.proxies['http']  = 'socks5h://localhost:9050'
s.proxies['https'] = 'socks5h://localhost:9050'

这应该会使您的测试正常工作。

无法使用请求连接到 Tor，而我使用硒做了同样的事情

Unable to connect to Tor using requests whereas I did the same using selenium

python

tor

web-scraping

python-3.x

python-requests