Python lxml 使用布尔值执行不同的 XPath 表达式
Using Boolean value to execute different XPath expressions with Python lxml
我正在尝试使用 python 脚本和 lxml 从网站上抓取天气数据。风速数据将被提取并附加到列表中以供以后操作。这样格式化后,我能够很好地获得我需要的信息:
<div class = "day-fcst">
<div class = "wind">
<div class = "gust">
"Gusts to 20-30mph"
</div>
</div>
</div>
但是,当风力较小时,网站会在 "gust" div 下添加子跨度 class,如下所示:
<div class = "gust">
<span class = "nowind">
"Gusts less than 20mph"
</span
</div>
我的想法是检查 span 是否存在,如果为真则执行 XPath 表达式将文本拉到 span 下,否则执行 XPath 表达式只是将文本拉到 "gust" div 下。我尝试搜索使用 XPath 布尔函数的示例,但无法使任何工作正常(无论是在 Safari 的 Web Inspector 中还是在我的脚本中)。
我目前的代码使用Python检查跨度class是否等同于"nowind",然后执行if和else语句,但只有else语句被执行.我当前的代码如下所示:
from lxml import html
import requests
wind = []
source=requests.get('website')
tree = html.fromstring(source.content)
if tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/@class') == 'nowind':
wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/text()'))
else:
wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/text()'))
print wind
我想使用生成布尔值的 XPath 表达式来解决此问题,而不是我当前的解决方法。任何帮助,将不胜感激。我对使用 XPath 还是个新手,所以我不熟悉使用它的任何功能。
两种情况下都可以使用相同的 xpath 表达式。只需使用 //div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()
或者,您可以获取 <div class = "wind">
元素,然后使用 text_content()
方法获取文本内容。
In [1]: from lxml import html
In [2]: first_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust">"Gusts to 20-30mph"</div></div></div>'
In [3]: second_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust"><span class = "nowind">"Gusts to 20-30mph"</span></div></div></div>'
In [4]: f = html.fromstring(first_html)
In [5]: s = html.fromstring(second_html)
In [6]: f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[6]: '"Gusts to 20-30mph"'
In [7]: s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[7]: '"Gusts to 20-30mph"'
In [8]: print(f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']
In [9]: print(s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']
我正在尝试使用 python 脚本和 lxml 从网站上抓取天气数据。风速数据将被提取并附加到列表中以供以后操作。这样格式化后,我能够很好地获得我需要的信息:
<div class = "day-fcst">
<div class = "wind">
<div class = "gust">
"Gusts to 20-30mph"
</div>
</div>
</div>
但是,当风力较小时,网站会在 "gust" div 下添加子跨度 class,如下所示:
<div class = "gust">
<span class = "nowind">
"Gusts less than 20mph"
</span
</div>
我的想法是检查 span 是否存在,如果为真则执行 XPath 表达式将文本拉到 span 下,否则执行 XPath 表达式只是将文本拉到 "gust" div 下。我尝试搜索使用 XPath 布尔函数的示例,但无法使任何工作正常(无论是在 Safari 的 Web Inspector 中还是在我的脚本中)。
我目前的代码使用Python检查跨度class是否等同于"nowind",然后执行if和else语句,但只有else语句被执行.我当前的代码如下所示:
from lxml import html
import requests
wind = []
source=requests.get('website')
tree = html.fromstring(source.content)
if tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/@class') == 'nowind':
wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/text()'))
else:
wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/text()'))
print wind
我想使用生成布尔值的 XPath 表达式来解决此问题,而不是我当前的解决方法。任何帮助,将不胜感激。我对使用 XPath 还是个新手,所以我不熟悉使用它的任何功能。
两种情况下都可以使用相同的 xpath 表达式。只需使用 //div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()
或者,您可以获取 <div class = "wind">
元素,然后使用 text_content()
方法获取文本内容。
In [1]: from lxml import html
In [2]: first_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust">"Gusts to 20-30mph"</div></div></div>'
In [3]: second_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust"><span class = "nowind">"Gusts to 20-30mph"</span></div></div></div>'
In [4]: f = html.fromstring(first_html)
In [5]: s = html.fromstring(second_html)
In [6]: f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[6]: '"Gusts to 20-30mph"'
In [7]: s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[7]: '"Gusts to 20-30mph"'
In [8]: print(f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']
In [9]: print(s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']