Python lxml 使用布尔值执行不同的 XPath 表达式

Question

我正在尝试使用 python 脚本和 lxml 从网站上抓取天气数据。风速数据将被提取并附加到列表中以供以后操作。这样格式化后，我能够很好地获得我需要的信息：

<div class = "day-fcst">
  <div class = "wind">
    <div class = "gust">
      "Gusts to 20-30mph"
    </div>
  </div>
</div>

但是，当风力较小时，网站会在 "gust" div 下添加子跨度 class，如下所示：

<div class = "gust">
  <span class = "nowind">
    "Gusts less than 20mph"
  </span
</div>

我的想法是检查 span 是否存在，如果为真则执行 XPath 表达式将文本拉到 span 下，否则执行 XPath 表达式只是将文本拉到 "gust" div 下。我尝试搜索使用 XPath 布尔函数的示例，但无法使任何工作正常（无论是在 Safari 的 Web Inspector 中还是在我的脚本中）。

我目前的代码使用Python检查跨度class是否等同于"nowind"，然后执行if和else语句，但只有else语句被执行.我当前的代码如下所示：

from lxml import html
import requests

wind = []

source=requests.get('website')
tree = html.fromstring(source.content)

if tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/@class') == 'nowind':
  wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/text()'))
else:
  wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/text()'))

print wind

我想使用生成布尔值的 XPath 表达式来解决此问题，而不是我当前的解决方法。任何帮助，将不胜感激。我对使用 XPath 还是个新手，所以我不熟悉使用它的任何功能。

Answer 1

两种情况下都可以使用相同的 xpath 表达式。只需使用 //div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()

或者，您可以获取 <div class = "wind"> 元素，然后使用 text_content() 方法获取文本内容。

In [1]: from lxml import html

In [2]: first_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust">"Gusts to 20-30mph"</div></div></div>'

In [3]: second_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust"><span class = "nowind">"Gusts to 20-30mph"</span></div></div></div>'

In [4]: f = html.fromstring(first_html)

In [5]: s = html.fromstring(second_html)

In [6]: f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[6]: '"Gusts to 20-30mph"'

In [7]: s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[7]: '"Gusts to 20-30mph"'

In [8]: print(f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']

In [9]: print(s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']

Python lxml 使用布尔值执行不同的 XPath 表达式

Using Boolean value to execute different XPath expressions with Python lxml

python

xpath

lxml

web-scraping

boolean-operations