Google 个工作表中的 IMPORTXML 函数

Question

使用 IMPORTXML 函数，是否可以构造一个 XPATH 查询来提取给定维基百科页面的行业值？

例如，我想从此页面中提取的值 - https://en.wikipedia.org/wiki/Target_Corporation - is "Retail" whereas on this page - https://en.wikipedia.org/wiki/Boohoo.com - 它将是 "Fashion".

Answer 1

尝试：

=INDEX(IMPORTXML("https://en.wikipedia.org/wiki/Boohoo.com", 
 "//td[@class='category']"), 2, 1)

=INDEX(IMPORTXML("https://en.wikipedia.org/wiki/Target_Corporation", 
 "//td[@class='category']"),2,1)

Answer 2

您想创建 xpath 来检索给定维基百科页面的行业值。

如果我的理解是正确的，和其他模式一样，这个xpath的公式怎么样？请将此视为几个答案之一。

示例公式：

=IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td")

xpath 是 //th[text()='Industry']/following-sibling::td。
在这种情况下，https://en.wikipedia.org/wiki/Target_Corporation 或 https://en.wikipedia.org/wiki/Boohoo.com 的 URL 被放入单元格 "A1"。

结果：

参考：

XPath Axes

已添加：

从您的回复中，我知道您想再添加 2 个 URL。所以全部URL如下

https://en.wikipedia.org/wiki/Target_Corporation
`https://en.wikipedia.org/wiki/Boohoo.com
`https://en.wikipedia.org/wiki/Woot
`https://en.wikipedia.org/wiki/TripAdvisor

问题和解决方法：

对于以上URLs，当使用=IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td")的公式时，Retail、Fashion、Retail和Travel, services返回。

修改xpath为//th[text()='Industry']/following-sibling::td/a时，返回Retail、#N/A、#N/A、Travel。

之所以这样，是因为以下的区别。

<tr>
  <th scope="row">Industry</th>
  <td class="category"><a href="/wiki/Travel" title="Travel">Travel</a> services</td>
</tr>

和

<tr>
  <th scope="row" style="padding-right:0.5em;">Industry</th>
  <td class="category" style="line-height:1.35em;"><a href="/wiki/Retail" title="Retail">Retail</a></td>
</tr>

和

<tr>
  <th scope="row" style="padding-right:0.5em;">Industry</th>
  <td class="category" style="line-height:1.35em;">Fashion</td>
</tr>

据此，我认为不幸的是，为了从上面检索 Travel、Retail 和 Fashion，仅用一个 xpath 无法直接检索它们。所以我针对这种情况使用了内置函数。

解决方法：

在此解决方法中，我使用了 INDEX。请将此视为几个答案之一。

=INDEX(IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td"),1,1)

xpath 是 //th[text()='Industry']/following-sibling::td。这个没有修改。
在这种情况下，URL 放在单元格 "A1" 中。
当检索到 2 个值时，将检索第一个值。通过这个，我使用了 INDEX.

结果：

Google 个工作表中的 IMPORTXML 函数

IMPORTXML function in Google Sheets

google-sheets

web-scraping

google-sheets-query

google-sheets-formula

google-sheets-importxml

示例公式：

结果：

参考：

已添加：

问题和解决方法：

解决方法：