使用 XPath 查找值
Find the value using XPath
我有一个 HTML table:
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value</div>
</div>
我需要 catch/get 属性 4 值...
for item in response.css('div.parameters'):
name = item.xpath('//div[text()[contains(.,"property 4")]]/following::div[1]/text()').get()
但是不行,错误在哪里?
尝试:
from lxml import etree as ET
xml_doc = """
<root>
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value 1</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value 2</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value 3</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value 4</div>
</div>
</root>
"""
parsed = ET.fromstring(xml_doc)
properties = parsed.xpath('//div[contains(@class, "property")]')
values = parsed.xpath('//div[contains(@class, "value")]')
out = {p.text: v.text for p, v in zip(properties, values)}
print(out["property 4"])
打印:
value 4
//div[contains(.,"property 4")]/./div//text()
上面的 xpath 表达式将上升一层,从该层开始 select 以下所有 div 意味着输出是 property 4 value
最终 xpath 表达式:
' '.join(response.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
由 scrapy 证明 shell:
In [1]: from scrapy.selector import Selector
In [2]: %paste
html ='''
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value 1</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value 2</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value 3</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value</div>
</div>
'''
## -- End pasted text --
In [3]: sel = Selector(text=html)
In [4]:
...: ' '.join(sel.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
Out[4]: 'property 4 value'
我有一个 HTML table:
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value</div>
</div>
我需要 catch/get 属性 4 值...
for item in response.css('div.parameters'):
name = item.xpath('//div[text()[contains(.,"property 4")]]/following::div[1]/text()').get()
但是不行,错误在哪里?
尝试:
from lxml import etree as ET
xml_doc = """
<root>
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value 1</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value 2</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value 3</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value 4</div>
</div>
</root>
"""
parsed = ET.fromstring(xml_doc)
properties = parsed.xpath('//div[contains(@class, "property")]')
values = parsed.xpath('//div[contains(@class, "value")]')
out = {p.text: v.text for p, v in zip(properties, values)}
print(out["property 4"])
打印:
value 4
//div[contains(.,"property 4")]/./div//text()
上面的 xpath 表达式将上升一层,从该层开始 select 以下所有 div 意味着输出是 property 4 value
最终 xpath 表达式:
' '.join(response.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
由 scrapy 证明 shell:
In [1]: from scrapy.selector import Selector
In [2]: %paste
html ='''
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value 1</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value 2</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value 3</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value</div>
</div>
'''
## -- End pasted text --
In [3]: sel = Selector(text=html)
In [4]:
...: ' '.join(sel.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
Out[4]: 'property 4 value'