如何通过 beautifulsoup 获取此 href？

Question

我想在这个网站上得到一个产品url： https://stockx.com/search?s=555088-105

the url i want to get

但我试试这个代码

link = soup.find("div", class_ = 'browse-grid loading undefined')
print(link)

只是return

<div class="browse-grid loading undefined"><div class="back-to-top"><div class="back-to-top-container"><img alt="back to top" src="https://stockx-assets.imgix.net/svg/icons/back-to-top.svg?auto=compress,format"/><span>TOP</span></div></div><div class="browse-grid"><div class="no-results">NOTHING TO SEE HERE! PLEASE CHANGE YOUR FILTERS OR <a href="/product-suggestion">Suggest a Product</a></div></div></div>

或者我试试这个，它只打印所有 url 而没有我想要的 url

a_tags = soup.find_all('a')
for tag in a_tags:
  print(tag.get('href'))

如何在我的图片中获得 url？

Answer 1

您在页面上看到的 URL 是通过 JavaScript 从外部源加载的 - 所以 beautifulsoup 看不到它。您可以使用 requests 模块模拟 Ajax 请求：

import re
import json
import requests

url = "https://stockx.com/search?s=555088-105"
api_url = "https://stockx.com/api/browse"

id_ = re.search(r"s=([\d-]+)", url).group(1)
params = {
    "": "",
    "currency": "EUR",
    "_search": id_,
    "dataType": "product",
}

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
    "Referer": url,
}

data = requests.get(api_url, params=params, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for product in data["Products"]:
    print("https://stockx.com/" + product["urlKey"])

打印：

https://stockx.com/air-jordan-1-retro-high-dark-mocha

如何通过 beautifulsoup 获取此 href？

How can I get this href by beautifulsoup?

python

tags

beautifulsoup

href

python-requests-html