使用相对 xpath 抓取自定义 div 属性

Question

我有几百个 URL，我试图从中抓取页面上图像的图像路径。每个页面都是相同的格式，但是每个页面的 div class 是唯一的。

我希望能够在 Google 工作表中使用 import xml 来仅抓取数据路径元素的内容。

我尝试过使用 xpath 提取 URL，但失败了。

<div class="uniqueid active" data-path="/~/media/Images/image.jpg" data-alt="Anything"></div>

例如//div[@class='*']/@data-path"

Answer 1

如果 div class 具有模式 "uniqueid active"，那么您可以尝试以下 XPath：

//div[contains(@class, "active")]/@data-path

否则，如果 div class 可以是任何值，请使用此查询：

//div[@class]/@data-path

更新：

我尝试使用 IMPORTXML 获取 data-path 属性的值，但没有成功。尝试使用 Python（requests 和 lxml）来做到这一点并且有效。所以问题可能出在 Google 表格中 - 一些限制或错误，idk。

Using relative xpath to scrape custom div attribute