如何通过 beautifulsoup 获取此 href?
How can I get this href by beautifulsoup?
我想在这个网站上得到一个产品url:
https://stockx.com/search?s=555088-105
the url i want to get
但我试试这个代码
link = soup.find("div", class_ = 'browse-grid loading undefined')
print(link)
只是return
<div class="browse-grid loading undefined"><div class="back-to-top"><div class="back-to-top-container"><img alt="back to top" src="https://stockx-assets.imgix.net/svg/icons/back-to-top.svg?auto=compress,format"/><span>TOP</span></div></div><div class="browse-grid"><div class="no-results">NOTHING TO SEE HERE! PLEASE CHANGE YOUR FILTERS OR <a href="/product-suggestion">Suggest a Product</a></div></div></div>
或者我试试这个,它只打印所有 url 而没有我想要的 url
a_tags = soup.find_all('a')
for tag in a_tags:
print(tag.get('href'))
如何在我的图片中获得 url?
您在页面上看到的 URL 是通过 JavaScript 从外部源加载的 - 所以 beautifulsoup
看不到它。您可以使用 requests
模块模拟 Ajax 请求:
import re
import json
import requests
url = "https://stockx.com/search?s=555088-105"
api_url = "https://stockx.com/api/browse"
id_ = re.search(r"s=([\d-]+)", url).group(1)
params = {
"": "",
"currency": "EUR",
"_search": id_,
"dataType": "product",
}
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Referer": url,
}
data = requests.get(api_url, params=params, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for product in data["Products"]:
print("https://stockx.com/" + product["urlKey"])
打印:
https://stockx.com/air-jordan-1-retro-high-dark-mocha
我想在这个网站上得到一个产品url: https://stockx.com/search?s=555088-105
the url i want to get
但我试试这个代码
link = soup.find("div", class_ = 'browse-grid loading undefined')
print(link)
只是return
<div class="browse-grid loading undefined"><div class="back-to-top"><div class="back-to-top-container"><img alt="back to top" src="https://stockx-assets.imgix.net/svg/icons/back-to-top.svg?auto=compress,format"/><span>TOP</span></div></div><div class="browse-grid"><div class="no-results">NOTHING TO SEE HERE! PLEASE CHANGE YOUR FILTERS OR <a href="/product-suggestion">Suggest a Product</a></div></div></div>
或者我试试这个,它只打印所有 url 而没有我想要的 url
a_tags = soup.find_all('a')
for tag in a_tags:
print(tag.get('href'))
如何在我的图片中获得 url?
您在页面上看到的 URL 是通过 JavaScript 从外部源加载的 - 所以 beautifulsoup
看不到它。您可以使用 requests
模块模拟 Ajax 请求:
import re
import json
import requests
url = "https://stockx.com/search?s=555088-105"
api_url = "https://stockx.com/api/browse"
id_ = re.search(r"s=([\d-]+)", url).group(1)
params = {
"": "",
"currency": "EUR",
"_search": id_,
"dataType": "product",
}
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Referer": url,
}
data = requests.get(api_url, params=params, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for product in data["Products"]:
print("https://stockx.com/" + product["urlKey"])
打印:
https://stockx.com/air-jordan-1-retro-high-dark-mocha