由于私有模式检测,urllib3 无法打开与 urllib2 能够打开的同一篇文章

urllib3 does not open the same article as urllib2 was able to open due to private mode detection

如何使用 urllib3 绕过私有模式检测。我有以下不起作用的代码:

import urllib3
from bs4 import BeautifulSoup

articleURL = "https://www.washingtonpost.com/news/the-switch/wp/2016/10/18/the-pentagons-massive-new-telescope-is-designed-to-track-space-junk-and-watch-out-for-killer-asteroids/"

import urllib3
from bs4 import BeautifulSoup
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

http = urllib3.PoolManager()
response = http.request('GET', articleURL)
soup = BeautifulSoup(response.data.decode('utf-8', 'ignore'))
soup

这会产生以下错误:

    </script> <script>var _0x108f=["blockers","pb-adblock-checked","resolve","all","overlay","mobile","desktop","browsers","max","isAnon","isSubscriber","Features","displayOverlay","extListener","getTime","performance","timing","navigationStart","registerPwapiConsumer","getOwnPropertyDescriptor","get","reject","notdetected","standard","notblocked","stack","validate","addEventListener","pb-core-loaded","iterator","symbol","function","constructor","prototype","assign","apply","Keep supporting great journalism by turning off your ad blocker. Or purchase a subscription for unlimited access to real news you can count on.",
'\x3ca data-link-ff\x3d"https://www.washingtonpost.com/steps-for-disabling-firefoxs-native-adblocker/2018/05/21/fb95bf4e-5d37-11e8-b2b8-08a538d9dbd6_story.html" data-link\x3d"https://www.washingtonpost.com/steps-for-disabling-adblocker/2016/09/14/a8c3d4d2-7aac-11e6-bd86-b7bbd53d2b5d_story.html" href\x3d"https://www.washingtonpost.com/steps-for-disabling-adblocker/2016/09/14/a8c3d4d2-7aac-11e6-bd86-b7bbd53d2b5d_story.html"\x3eUnblock ads\x3c/a\x3e','\x3ca href\x3d"https://subscribe.washingtonpost.com/acq/?promo\x3do12" target\x3d"_blank"\x3e\x3cspan class\x3d"subscribe-link"\x3eTry 1 month for \x3c/span\x3e\x3c/a\x3e',
"event 86","We noticed you\u2019re browsing in private mode.","Private browsing is permitted exclusively for our subscribers. Turn off private browsing to keep reading this story, or subscribe to use this feature, plus get unlimited digital access.",'\x3ca data-link-ff\x3d"https://helpcenter.washingtonpost.com/hc/en-us/articles/360028029392l" data-link\x3d"https://helpcenter.washingtonpost.com/hc/en-us/articles/360028029392" href\x3d"https://helpcenter.washingtonpost.com/hc/en-us/articles/360028029392"\x3eTurn off private browsing\x3c/a\x3e'

我无意触发此警告,它与 urllib2 一起工作正常:

import urllib2
from bs4 import BeautifulSoup

articleURL = "https://www.washingtonpost.com/news/the-switch/wp/2016/10/18/the-pentagons-massive-new-telescope-is-designed-to-track-space-junk-and-watch-out-for-killer-asteroids/"

page = urllib2.urlopen(articleURL).read().decode('utf8','ignore') 
soup = BeautifulSoup(page,"lxml")
soup

尝试此更改(您需要指定 user-agent header):

headers = {'user-agent': 'Mozilla/5.0 (Windows NT x.y; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0'}
response = http.request('GET', articleURL, headers=headers)