使用 XPath 查找值

Find the value using XPath

我有一个 HTML table:

<div class="parameters">
    <div class="property">property 1</div>
    <div class="value">value</div>
</div>
<div class="parameters">
    <div class="property">property 2</div>
    <div class="value">value</div>
</div>
<div class="parameters">
    <div class="property">property 3</div>
    <div class="value">value</div>
</div>
<div class="parameters">
    <div class="property">property 4</div>
    <div class="value">value</div>
</div>

我需要 catch/get 属性 4 值...

for item in response.css('div.parameters'):
    name = item.xpath('//div[text()[contains(.,"property 4")]]/following::div[1]/text()').get()

但是不行,错误在哪里?

尝试:

from lxml import etree as ET

xml_doc = """
<root>

<div class="parameters">
    <div class="property">property 1</div>
    <div class="value">value 1</div>
</div>
<div class="parameters">
    <div class="property">property 2</div>
    <div class="value">value 2</div>
</div>
<div class="parameters">
    <div class="property">property 3</div>
    <div class="value">value 3</div>
</div>
<div class="parameters">
    <div class="property">property 4</div>
    <div class="value">value 4</div>
</div>

</root>
"""

parsed = ET.fromstring(xml_doc)

properties = parsed.xpath('//div[contains(@class, "property")]')
values = parsed.xpath('//div[contains(@class, "value")]')

out = {p.text: v.text for p, v in zip(properties, values)}
print(out["property 4"])

打印:

value 4
//div[contains(.,"property 4")]/./div//text()

上面的 xpath 表达式将上升一层,从该层开始 select 以下所有 div 意味着输出是 property 4 value

最终 xpath 表达式:

' '.join(response.xpath('//div[contains(.,"property 4")]/./div//text()').getall())

由 scrapy 证明 shell:

In [1]: from scrapy.selector import Selector

In [2]: %paste
html ='''
<div class="parameters">
    <div class="property">property 1</div>
    <div class="value">value 1</div>
</div>
<div class="parameters">
    <div class="property">property 2</div>
    <div class="value">value 2</div>
</div>
<div class="parameters">
    <div class="property">property 3</div>
    <div class="value">value 3</div>
</div>
<div class="parameters">
    <div class="property">property 4</div>
    <div class="value">value</div>
</div>
'''

## -- End pasted text --

In [3]: sel = Selector(text=html)

In [4]: 
   ...: ' '.join(sel.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
Out[4]: 'property 4 value'