可以 lxml.xpath 将 <td/> 转换为 ""
Can lxml.xpath convert <td/> to ""
我正在使用 lxml 来解析 html 字符串,例如:
<tr>
<td>111</td>
<td>222</td>
<td>20201208</td>
<td></td>
<td>26</td>
<td>1431</td>
<td></td>
</tr>
html.xpath的结果是
["111","222","20201208","26","1431"]
我的问题是我能得到像
这样的结果吗
["111","222","20201208","","26","1431",""]
lxml 中是否有任何选项可以做到这一点
我使用以下代码获取元素:
tds=tr.xpath(".//td/text()")
下面是你如何使用 ElementTree 或 lxml(它的代码相同 - 只是导入不同)
import xml.etree.ElementTree as ET
from lxml import etree
xml = '''<tr>
<td>111</td>
<td>222</td>
<td>20201208</td>
<td></td>
<td>26</td>
<td>1431</td>
<td></td>
</tr>'''
root1 = ET.fromstring(xml)
data = [td.text if td.text else '' for td in root1.findall('.//td')]
print(data)
root2 = etree.fromstring(xml)
data = [td.text if td.text else '' for td in root2.findall('.//td')]
print(data)
输出
['111', '222', '20201208', '', '26', '1431', '']
['111', '222', '20201208', '', '26', '1431', '']
我正在使用 lxml 来解析 html 字符串,例如:
<tr>
<td>111</td>
<td>222</td>
<td>20201208</td>
<td></td>
<td>26</td>
<td>1431</td>
<td></td>
</tr>
html.xpath的结果是
["111","222","20201208","26","1431"]
我的问题是我能得到像
这样的结果吗 ["111","222","20201208","","26","1431",""]
lxml 中是否有任何选项可以做到这一点
我使用以下代码获取元素:
tds=tr.xpath(".//td/text()")
下面是你如何使用 ElementTree 或 lxml(它的代码相同 - 只是导入不同)
import xml.etree.ElementTree as ET
from lxml import etree
xml = '''<tr>
<td>111</td>
<td>222</td>
<td>20201208</td>
<td></td>
<td>26</td>
<td>1431</td>
<td></td>
</tr>'''
root1 = ET.fromstring(xml)
data = [td.text if td.text else '' for td in root1.findall('.//td')]
print(data)
root2 = etree.fromstring(xml)
data = [td.text if td.text else '' for td in root2.findall('.//td')]
print(data)
输出
['111', '222', '20201208', '', '26', '1431', '']
['111', '222', '20201208', '', '26', '1431', '']