requests 不从某些网站检索 html 内容
requests does not retrieve html content from some websites
当试图获取网站的 HTML 内容时,在这种情况下,www.arrow.com,我什么也得不到,网络浏览器一直在等待。
import requests
params = {'q': code}
url = "https://www.arrow.com/en/products/search"
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'cache-control': "no-cache",
'postman-token': "564e5d76-282f-98f3-860b-d8e09e2e9073"
}
r = requests.get(url, headers=headers,params=params)
tree = html.fromstring(r.content)
奇怪的是我可以使用 Postman 并通过网络浏览器访问正确的内容。
Postman 在使用 HTTP 时使用这个脚本:
GET /en/products/search?q=cccccccc HTTP/1.1
Host: www.arrow.com
Cache-Control: no-cache
Postman-Token: c3821bb3-767b-b8c7-105a-84fd16291245
或 Python3:
import http.client
conn = http.client.HTTPSConnection("www.arrow.com")
headers = {
'cache-control': "no-cache",
'postman-token': "740c5681-3e67-b605-3040-964be3ea7296"
}
conn.request("GET", "/en/products/search?q=cccccccc", headers=headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
使用最后一个,我也一无所获。
更改 User-Agent
应该可以解决问题,至少在我的案例中是这样。您的 params
也不正确。试试看会发生什么:
import requests
from lxml.html import fromstring
url = "https://www.arrow.com/en/products/search?"
code = "apple" #any available search terms
r = requests.get(url,
headers={'User-Agent': 'Mozilla/5.0'},
params={'cat':'','q': code,'r': True}
)
tree = fromstring(r.content)
items = tree.cssselect("h1[data-search-term]")[0].text.strip()
print(items) #it should give you the quantity of search result
当试图获取网站的 HTML 内容时,在这种情况下,www.arrow.com,我什么也得不到,网络浏览器一直在等待。
import requests
params = {'q': code}
url = "https://www.arrow.com/en/products/search"
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'cache-control': "no-cache",
'postman-token': "564e5d76-282f-98f3-860b-d8e09e2e9073"
}
r = requests.get(url, headers=headers,params=params)
tree = html.fromstring(r.content)
奇怪的是我可以使用 Postman 并通过网络浏览器访问正确的内容。
Postman 在使用 HTTP 时使用这个脚本:
GET /en/products/search?q=cccccccc HTTP/1.1
Host: www.arrow.com
Cache-Control: no-cache
Postman-Token: c3821bb3-767b-b8c7-105a-84fd16291245
或 Python3:
import http.client
conn = http.client.HTTPSConnection("www.arrow.com")
headers = {
'cache-control': "no-cache",
'postman-token': "740c5681-3e67-b605-3040-964be3ea7296"
}
conn.request("GET", "/en/products/search?q=cccccccc", headers=headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
使用最后一个,我也一无所获。
更改 User-Agent
应该可以解决问题,至少在我的案例中是这样。您的 params
也不正确。试试看会发生什么:
import requests
from lxml.html import fromstring
url = "https://www.arrow.com/en/products/search?"
code = "apple" #any available search terms
r = requests.get(url,
headers={'User-Agent': 'Mozilla/5.0'},
params={'cat':'','q': code,'r': True}
)
tree = fromstring(r.content)
items = tree.cssselect("h1[data-search-term]")[0].text.strip()
print(items) #it should give you the quantity of search result