pyquery (lxml) 在结构良好的 XML 文档中找不到标签?
pyquery (lxml) not finding a tag in a well-structured XML document?
我有一个 XML 文件看起来像 this。相关位是这样的:
<reference>
<citation>Vander Wal JS, Gang CH, Griffing GT, Gadde KM. Escitalopram for treatment of night eating syndrome: a 12-week, randomized, placebo-controlled trial. J Clin Psychopharmacol. 2012 Jun;32(3):341-5. doi: 10.1097/JCP.0b013e318254239b.</citation>
<PMID>22544016</PMID>
</reference>
我试图找到 PMID
字段的值,使用 PyQuery 解析 XML:
from pyquery import PyQuery as pq
text = open(f, 'r').read()
d = pq(text)
data = {}
data['nct_id'] = d('nct_id').text()
print d('reference')
reference = d('reference')
print reference('PMID')
data['pmid'] = reference('PMID').text()
print data['PMID']
为什么这不起作用?在控制台中,我从第一个打印语句中看到 reference
的完整内容,后跟两个空值:
<reference>
<citation>Vander Wal JS, Gang CH, Griffing GT, Gadde KM. Escitalopram for treatment of night eating syndrome: a 12-week, randomized, placebo-controlled trial. J Clin Psychopharmacol. 2012 Jun;32(3):341-5. doi: 10.1097/JCP.0b013e318254239b.</citation>
<PMID>22544016</PMID>
</reference>
我可以使用 .find()
找到文档中的其他叶节点(如 nct_id
),如示例代码所示。
是PyQuery不喜欢大写标签吗?
您可以指定要使用的解析器,它将起作用:
d = pq(text, parser='xml')
我有一个 XML 文件看起来像 this。相关位是这样的:
<reference>
<citation>Vander Wal JS, Gang CH, Griffing GT, Gadde KM. Escitalopram for treatment of night eating syndrome: a 12-week, randomized, placebo-controlled trial. J Clin Psychopharmacol. 2012 Jun;32(3):341-5. doi: 10.1097/JCP.0b013e318254239b.</citation>
<PMID>22544016</PMID>
</reference>
我试图找到 PMID
字段的值,使用 PyQuery 解析 XML:
from pyquery import PyQuery as pq
text = open(f, 'r').read()
d = pq(text)
data = {}
data['nct_id'] = d('nct_id').text()
print d('reference')
reference = d('reference')
print reference('PMID')
data['pmid'] = reference('PMID').text()
print data['PMID']
为什么这不起作用?在控制台中,我从第一个打印语句中看到 reference
的完整内容,后跟两个空值:
<reference>
<citation>Vander Wal JS, Gang CH, Griffing GT, Gadde KM. Escitalopram for treatment of night eating syndrome: a 12-week, randomized, placebo-controlled trial. J Clin Psychopharmacol. 2012 Jun;32(3):341-5. doi: 10.1097/JCP.0b013e318254239b.</citation>
<PMID>22544016</PMID>
</reference>
我可以使用 .find()
找到文档中的其他叶节点(如 nct_id
),如示例代码所示。
是PyQuery不喜欢大写标签吗?
您可以指定要使用的解析器,它将起作用:
d = pq(text, parser='xml')