Python web scraping script does not find element by xPath even though it exists

目前我正在编写一个小脚本,它应该提取最便宜产品的名称、link、价格和图片,给定 link 我所在国家/地区的价格比较网站。

示例 link 如下所示:


#!/usr/bin/env python3
from urllib.request import Request, urlopen
from lxml import html
from lxml import etree

from lxml.etree import tostring

link = ''
def get_webSite():
    req = Request(link, headers={'User-Agent': 'Mozilla/5.0'})
    return  urlopen(req).read()

webpage = get_webSite() # Contains all HTML from the site
root = html.fromstring(webpage)

price = root.xpath("//*[@id=\"product0\"]/div[6]/span/span")[0].text.strip()
name = root.xpath("//*[@id=\"product0\"]/div[2]/a/span")[0].text.strip()
link = "" + root.xpath("//*[@id=\"product0\"]/div[2]/a/@href")[0]
picture = root.xpath("//*[@id=\"product0\"]/div[1]/a/div/picture/img/@big-image-url")[0]
# the @ refers to the attribute of the selected element, / slashes seem to separate the searched terms
# The [0] refers to the first element of a list, we use this because xPath returns a list with exactly one item

price = price.lstrip('€ ') # removes the euro sign and the space
price = price.replace(',', '.') # removes the comma with a dot
price = float(price) # converts price string to float

print(f"Price : {price}")
print("Name : " + (name))
print("Link : " + (link))
print("PictureLink : " + (picture))

除图片缩略图 link 外,一切正常并打印到控制台中。 我已经尝试了正常的 xPath 和完整的 xPath,但都无济于事。没有找到这样的元素,即使它存在。


您的 xpath 中的错误在于:




否则,/ 将遍历到 img 的子级,但您想检查 img 标签本身的属性。这是从页面中抓取所有图像的示例:

import requests
from lxml import html
root = html.fromstring(res.content)
[item.attrib['big-image-url'] for item in root.xpath('//img[@big-image-url]')]
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

所以它应该在 html big-image-url 属性中,例如: