遍历 XML 并选择特定的元素树内容
Iterating over XML and selecting specific element tree content
我有一个 XML 看起来像这样:
<openie>
<triple confidence="1.000">
<subject begin="0" end="1">
<text>PAF</text>
<lemma>paf</lemma>
</subject>
<relation begin="1" end="2">
<text>gets</text>
<lemma>get</lemma>
</relation>
<object begin="2" end="6">
<text>name of web site</text>
<lemma>name of web site</lemma>
</object>
</triple>
<triple confidence="1.000">
<subject begin="0" end="1">
<text>PAF</text>
<lemma>paf</lemma>
</subject>
<relation begin="1" end="2">
<text>gets</text>
<lemma>get</lemma>
</relation>
<object begin="2" end="3">
<text>name</text>
<lemma>name</lemma>
</object>
</triple>
</openie>
元素 openie
嵌套在此处:root>document>sentences>sentence>openie
在我的函数中,我试图打印 triples
,每个都包含 subject, relation, object
元素。不幸的是,我无法让它工作,因为我无法进入这三个元素及其 text
元素。哪一部分是错误的?
def get_openie():
print('OpenIE parser start...')
tree = ET.parse('./tmp/nlp_output.xml')
root = tree.getroot()
for triple in root.findall('./document/sentences/sentence/openie/triple'):
t_subject = triple.find('subject/text').text
t_relation = triple.find('relation/text').text
t_object = triple.get('object/text').text
print(t_subject,t_relation,t_object)
两个三元组的输出应如下所示:
PAF gets name of web site
PAF gets name
要获得 t_object
,您是 运行 triple.get()
而不是 triple.find()
。更改它可以解决您的问题。
def get_openie():
print('OpenIE parser start...')
tree = ET.parse('./tmp/nlp_output.xml')
root = tree.getroot()
for triple in root.findall('./document/sentences/sentence/openie/triple'):
t_subject = triple.find('subject/text').text
t_relation = triple.find('relation/text').text
t_object = triple.find('object/text').text
print(t_subject,t_relation,t_object)
我有一个 XML 看起来像这样:
<openie>
<triple confidence="1.000">
<subject begin="0" end="1">
<text>PAF</text>
<lemma>paf</lemma>
</subject>
<relation begin="1" end="2">
<text>gets</text>
<lemma>get</lemma>
</relation>
<object begin="2" end="6">
<text>name of web site</text>
<lemma>name of web site</lemma>
</object>
</triple>
<triple confidence="1.000">
<subject begin="0" end="1">
<text>PAF</text>
<lemma>paf</lemma>
</subject>
<relation begin="1" end="2">
<text>gets</text>
<lemma>get</lemma>
</relation>
<object begin="2" end="3">
<text>name</text>
<lemma>name</lemma>
</object>
</triple>
</openie>
元素 openie
嵌套在此处:root>document>sentences>sentence>openie
在我的函数中,我试图打印 triples
,每个都包含 subject, relation, object
元素。不幸的是,我无法让它工作,因为我无法进入这三个元素及其 text
元素。哪一部分是错误的?
def get_openie():
print('OpenIE parser start...')
tree = ET.parse('./tmp/nlp_output.xml')
root = tree.getroot()
for triple in root.findall('./document/sentences/sentence/openie/triple'):
t_subject = triple.find('subject/text').text
t_relation = triple.find('relation/text').text
t_object = triple.get('object/text').text
print(t_subject,t_relation,t_object)
两个三元组的输出应如下所示:
PAF gets name of web site
PAF gets name
要获得 t_object
,您是 运行 triple.get()
而不是 triple.find()
。更改它可以解决您的问题。
def get_openie():
print('OpenIE parser start...')
tree = ET.parse('./tmp/nlp_output.xml')
root = tree.getroot()
for triple in root.findall('./document/sentences/sentence/openie/triple'):
t_subject = triple.find('subject/text').text
t_relation = triple.find('relation/text').text
t_object = triple.find('object/text').text
print(t_subject,t_relation,t_object)