可以 lxml.xpath 将 <td/> 转换为 ""

Can lxml.xpath convert <td/> to ""

我正在使用 lxml 来解析 html 字符串,例如:

<tr>
 <td>111</td>   
 <td>222</td>                                   
 <td>20201208</td>                                 
 <td></td>                                  
 <td>26</td>                                   
 <td>1431</td>                                 
 <td></td>
</tr>

html.xpath的结果是

["111","222","20201208","26","1431"]

我的问题是我能得到像

这样的结果吗
 ["111","222","20201208","","26","1431",""]

lxml 中是否有任何选项可以做到这一点

我使用以下代码获取元素:

tds=tr.xpath(".//td/text()")

下面是你如何使用 ElementTree 或 lxml(它的代码相同 - 只是导入不同)

import xml.etree.ElementTree as ET
from lxml import etree

xml = '''<tr>
 <td>111</td>   
 <td>222</td>                                   
 <td>20201208</td>                                 
 <td></td>                                  
 <td>26</td>                                   
 <td>1431</td>                                 
 <td></td>
</tr>'''

root1 = ET.fromstring(xml)
data = [td.text if td.text else '' for td in root1.findall('.//td')]
print(data)

root2 = etree.fromstring(xml)
data = [td.text if td.text else '' for td in root2.findall('.//td')]
print(data)

输出

['111', '222', '20201208', '', '26', '1431', '']

['111', '222', '20201208', '', '26', '1431', '']