请求无法获取页面
Requests is unable to get page
我正在尝试使用 Beautiful Soup 检索 this page:
这是我试过的代码:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines")
每次我 运行 我的代码都会卡住,无法检索页面。但是,我收到一次 ReadTimeout 异常 (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.nasdaq.com', port=443): Read timed out. (read timeout=None)
)。
任何对此问题的帮助或修复将不胜感激。
而不是这样做
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines")
尝试以这种方式检索网页:
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
page = Request("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines")
我在请求中加入了 headers,它似乎有效。我使用了我的浏览器发送的相同 headers,您可以使用开发人员工具找到它(如 indicated here)。
import requests
headers = {
"authority": "www.nasdaq.com",
"method": "GET",
"path": "/market-activity/stocks/msft/news-headlines",
"scheme": "https",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-CA,en;q=0.9,ro-RO;q=0.8,ro;q=0.7,en-GB;q=0.6,en-US;q=0.5",
"cache-control": "max-age=0",
"dnt": "1",
"if-modified-since": "Tue, 30 Jun 2020 19:43:05 GMT",
"if-none-match": "1593546185",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
}
page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines", headers=headers)
我正在尝试使用 Beautiful Soup 检索 this page:
这是我试过的代码:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines")
每次我 运行 我的代码都会卡住,无法检索页面。但是,我收到一次 ReadTimeout 异常 (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.nasdaq.com', port=443): Read timed out. (read timeout=None)
)。
任何对此问题的帮助或修复将不胜感激。
而不是这样做
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines")
尝试以这种方式检索网页:
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
page = Request("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines")
我在请求中加入了 headers,它似乎有效。我使用了我的浏览器发送的相同 headers,您可以使用开发人员工具找到它(如 indicated here)。
import requests
headers = {
"authority": "www.nasdaq.com",
"method": "GET",
"path": "/market-activity/stocks/msft/news-headlines",
"scheme": "https",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-CA,en;q=0.9,ro-RO;q=0.8,ro;q=0.7,en-GB;q=0.6,en-US;q=0.5",
"cache-control": "max-age=0",
"dnt": "1",
"if-modified-since": "Tue, 30 Jun 2020 19:43:05 GMT",
"if-none-match": "1593546185",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
}
page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines", headers=headers)