找不到商家 ID - 亚马逊

Question

我在亚马逊产品页面上找不到商家 ID，是不是漏了什么？任何帮助都会很棒！我总是在终端上收到相同的消息：“未找到商户 ID”。网站 URL：https://www.amazon.com/dp/B004X4KRW0/ref=olp-opf-redir?aod=1&ie=UTF8&condition=NEW&th=1 目标：使用 python 列出所有商家 ID。什么是商户号？对于亚马逊上的每个卖家，商家 ID 唯一标识他们，例如来自上述网站 URL，如果我要将亚马逊的商家 ID 定位为卖家，它将在 html 中标识为 ATVPDKIKX0DER 用于 Amazon.com（美国）： <div id="fast-track" class="a-section a-spacing-none"> <input type="hidden" id="ftSelectAsin" value="B004X4KRW0"/> <input type="hidden" id="ftSelectMerchant" value="ATVPDKIKX0DER"/> 因此，我正在尝试使用 xpath 为所有卖家打印商户 ID（输出）。

# Get Seller merchant ID
# Default Merchant ID
merchant_id = ""
# Try to find merchant ID with xpath
try:
    merchant_id = offer.xpath(
         .//input[@id='ftSelectMerchant' or @id='ddmSelectMerchant']"
    )[0].value
except IndexError:
    # try to find merchant ID with regex
    try:
        merchant_script = offer.xpath(".//script")[0].text.strip()
        find_merchant_id = re.search(
            r"merchantId = \"(\w+?)\";", merchant_script
        )
        if find_merchant_id:
            merchant_id = find_merchant_id.group(1)
    except IndexError:
        pass
log.info(f"merchant_id: {merchant_id}")
# log failure to find merchant ID
if not merchant_id:
    log.debug("No Merchant ID found")```

Answer 1

您似乎在抓取隐藏参数。可能有很多方法可以做到这一点。我会用两种方式展示我的工作。

这里使用的是selenium。 element.get_attribute("innerHTML") 给出 html 字符串。只需使用正则表达式提取值。

import re

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

url = "https://www.amazon.com/dp/B004X4KRW0"

# set headless
options = Options()
options.headless = True

driver = webdriver.Firefox(options=options)
driver.get(url)

element = driver.find_element_by_xpath("//div[@class='a-section']")

innerhtml = element.get_attribute("innerHTML")

# find and get value
a = re.search('<.*merchantID.*value="(.*)"', innerhtml)

print(a.groups()[0])  # ATVPDKIKX0DER

另一种方法是使用 BeautifulSoup 和请求。这个比较简单但是有时会失败（可能是服务器的响应，不确定...）

import urllib.request
from bs4 import BeautifulSoup

url = 'https://www.amazon.com/dp/B004X4KRW0'

html = urllib.request.urlopen(url).read().decode('utf-8')

soup= BeautifulSoup(html, "lxml")

value = soup.find_all("input", {"id":"merchantID"})[0]['value']

print(value)  # ATVPDKIKX0DER

（我正在使用 selenium 检查亚马逊网站以了解价格变化，有时属性名称会发生更改。因此最好时不时检查一下是否一切正常。）

找不到商家 ID - 亚马逊

Merchant id not found - Amazon

python

amazon