python lxml xpath:如何让这个谓词工作
python lxml xpath: how to get this predicate working
早上好,
最近我开始 python 和网络抓取作为一种爱好 ...
我正试图解决 python lxml 和 xpath 谓词的问题,但是很遗憾 - 显然在 Whosebug 上没有类似的东西。所以我设法在下面的代码中重现,希望有人看到我没有看到的东西......
有没有人可以解释为什么 Result3 是一个空列表?
我期望 Result3 与 Result1 相同。
如何实现 Result3 = Result1?
版本:Python 3.7.3、lxml 4.4.0(使用 pip 安装,而不是 Christoph Gohlke 的二进制文件)在 AMD windows 机器上。
提前致谢!
斯蒂夫
import lxml.html
simple_record = """<a href="some_map/some_file.png">dododo</a>"""
tree = lxml.html.fromstring(simple_record)
simple_xpath = "@href"
found_field = tree.xpath(simple_xpath)
print("Result1 = {}".format(found_field))
simple_xpath = """contains(@href,"some_file")"""
found_field = tree.xpath(simple_xpath)
print("Result2 = {}".format(found_field))
simple_xpath = """@href[contains(@href,"some_file")]"""
found_field = tree.xpath(simple_xpath)
print("Result3 = {}".format(found_field))
实际输出:
Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = []
预期输出:
Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = ['some_map/some_file.png']
你在第三个例子中的谓词(@href[contains(@href,"some_file")]
),翻译成英文就是"find a node in simple_record
which has an attribute href
which itself has an attribute href
which has an attribute value containing the string some_file
"。这样的节点不存在,所以返回一个空的结果列表。
你想用英语问的是"find a node in simple_record
which has an attribute href
which has a value containing the string some_file
"(谢谢@DanielHaley!)。翻译成xpath,你会写成
simple_xpath = '@href[contains(.,"some_file")]'
.
现在指回被谓词过滤的上下文节点(即 @href
属性本身)。该表达式将导致结果 3 与结果 1 相同。
早上好,
最近我开始 python 和网络抓取作为一种爱好 ...
我正试图解决 python lxml 和 xpath 谓词的问题,但是很遗憾 - 显然在 Whosebug 上没有类似的东西。所以我设法在下面的代码中重现,希望有人看到我没有看到的东西......
有没有人可以解释为什么 Result3 是一个空列表? 我期望 Result3 与 Result1 相同。
如何实现 Result3 = Result1?
版本:Python 3.7.3、lxml 4.4.0(使用 pip 安装,而不是 Christoph Gohlke 的二进制文件)在 AMD windows 机器上。
提前致谢!
斯蒂夫
import lxml.html
simple_record = """<a href="some_map/some_file.png">dododo</a>"""
tree = lxml.html.fromstring(simple_record)
simple_xpath = "@href"
found_field = tree.xpath(simple_xpath)
print("Result1 = {}".format(found_field))
simple_xpath = """contains(@href,"some_file")"""
found_field = tree.xpath(simple_xpath)
print("Result2 = {}".format(found_field))
simple_xpath = """@href[contains(@href,"some_file")]"""
found_field = tree.xpath(simple_xpath)
print("Result3 = {}".format(found_field))
实际输出:
Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = []
预期输出:
Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = ['some_map/some_file.png']
你在第三个例子中的谓词(@href[contains(@href,"some_file")]
),翻译成英文就是"find a node in simple_record
which has an attribute href
which itself has an attribute href
which has an attribute value containing the string some_file
"。这样的节点不存在,所以返回一个空的结果列表。
你想用英语问的是"find a node in simple_record
which has an attribute href
which has a value containing the string some_file
"(谢谢@DanielHaley!)。翻译成xpath,你会写成
simple_xpath = '@href[contains(.,"some_file")]'
.
现在指回被谓词过滤的上下文节点(即 @href
属性本身)。该表达式将导致结果 3 与结果 1 相同。