python lxml xpath:如何让这个谓词工作

python lxml xpath: how to get this predicate working

早上好,

最近我开始 python 和网络抓取作为一种爱好 ...

我正试图解决 python lxml 和 xpath 谓词的问题,但是很遗憾 - 显然在 Whosebug 上没有类似的东西。所以我设法在下面的代码中重现,希望有人看到我没有看到的东西......

有没有人可以解释为什么 Result3 是一个空列表? 我期望 Result3 与 Result1 相同。

如何实现 Result3 = Result1?

版本:Python 3.7.3、lxml 4.4.0(使用 pip 安装,而不是 Christoph Gohlke 的二进制文件)在 AMD windows 机器上。

提前致谢!

斯蒂夫

import lxml.html

simple_record  = """<a href="some_map/some_file.png">dododo</a>"""
tree           = lxml.html.fromstring(simple_record)

simple_xpath   = "@href"
found_field    = tree.xpath(simple_xpath)
print("Result1 = {}".format(found_field))

simple_xpath   = """contains(@href,"some_file")"""
found_field    = tree.xpath(simple_xpath)
print("Result2 = {}".format(found_field))

simple_xpath   = """@href[contains(@href,"some_file")]"""
found_field    = tree.xpath(simple_xpath)
print("Result3 = {}".format(found_field))

实际输出:

Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = []

预期输出:

Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = ['some_map/some_file.png']

你在第三个例子中的谓词(@href[contains(@href,"some_file")]),翻译成英文就是"find a node in simple_record which has an attribute href which itself has an attribute href which has an attribute value containing the string some_file"。这样的节点不存在,所以返回一个空的结果列表。

你想用英语问的是"find a node in simple_record which has an attribute href which has a value containing the string some_file"(谢谢@DanielHaley!)。翻译成xpath,你会写成

simple_xpath   = '@href[contains(.,"some_file")]'

. 现在指回被谓词过滤的上下文节点(即 @href 属性本身)。该表达式将导致结果 3 与结果 1 相同。