使用 lxml 检索 class 属性的名称
Retrieving the name of a class attribute with lxml
我正在开展一个 python 项目,该项目使用 lxml 来废弃页面,我面临着检索跨度 class 属性名称的挑战。 html 片段如下:
<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>
....
如何检索下面跨度的 class 属性的值:
<span class="brand">carlos santos</span>
from bs4 import BeautifulSoup
lxml = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
<tr>'''
soup = BeautifulSoup(lxml, 'lxml')
result = soup.find('span')['class'] # result = 'brand'
您可以使用以下 XPath 获取 span
元素的 class
属性,该元素是 td
和 class product
的直接子元素:
//td[@class="product"]/span/@class
工作演示示例:
from lxml import html
raw = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>'''
root = html.fromstring(raw)
span = root.xpath('//td[@class="product"]/span/@class')[0]
print span
输出:
Brand
我正在开展一个 python 项目,该项目使用 lxml 来废弃页面,我面临着检索跨度 class 属性名称的挑战。 html 片段如下:
<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>
....
如何检索下面跨度的 class 属性的值:
<span class="brand">carlos santos</span>
from bs4 import BeautifulSoup
lxml = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
<tr>'''
soup = BeautifulSoup(lxml, 'lxml')
result = soup.find('span')['class'] # result = 'brand'
您可以使用以下 XPath 获取 span
元素的 class
属性,该元素是 td
和 class product
的直接子元素:
//td[@class="product"]/span/@class
工作演示示例:
from lxml import html
raw = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>'''
root = html.fromstring(raw)
span = root.xpath('//td[@class="product"]/span/@class')[0]
print span
输出:
Brand