从具有奇数树结构的 xml 中提取 etree 数据
etree data extraction from xml with odd tree structure
这里是 xml 数据的一部分,然后再继续
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="episode1">
<media>
<video>
<track>
<generatoritem id="Gen Subtitle1">
<effect>
<name>Gen Subtitle</name>
<effectid>Gen Subtitle</effectid>
<effectcategory>Text</effectcategory>
<effecttype>generator</effecttype>
<mediatype>video</mediatype>
<parameter>
<parameterid>part1</parameterid>
<name>Text Settings</name>
<value/>
</parameter>
<parameter>
<parameterid>str</parameterid>
<name>Text</name>
<value>You're a coward for picking on people who are weaker than you.</value>
</parameter>
<parameter>
<parameterid>font</parameterid>
<name>Font</name>
<value>Arial</value>
</parameter>
</effect>
</media>
</sequence>
</xmeml>
现在你可以看到树从 <effect>
开始,里面有多个 <parameters>
但我只从 <parameters>
中删除 <value>
也包含
<parameterid>str</parameterid>
<name>Text</name>
所以我可以获得 "That child is so cute.
And he's smart."
的输出
这是我的代码
lst = tree.findall('xmeml/sequence/media/video/track/generatoritem/effect/parameter/value')
counts = tree.findall('.//value')
for each in counts:
print(each.text)
这就是我得到的
And he's smart.
Arial
见下文
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="episode1">
<effect>
<name>Gen Subtitle</name>
<effectid>Gen Subtitle</effectid>
<effectcategory>Text</effectcategory>
<effecttype>generator</effecttype>
<mediatype>video</mediatype>
<parameter>
<parameterid>part1</parameterid>
<name>Text Settings</name>
<value/>
</parameter>
<parameter>
<parameterid>str</parameterid>
<name>Text</name>
<value>That child is so cute. And he's smart</value>
</parameter>
<parameter>
<parameterid>font</parameterid>
<name>Font</name>
<value>Arial</value>
</parameter>
</effect>
</sequence>
</xmeml>'''
root = ET.fromstring(xml)
str_params = root.findall('.//parameter/[parameterid="str"]')
for param in str_params:
if param.find('./name').text == 'Text':
print('The text: {}'.format(param.find('./value').text))
break
输出
The text: That child is so cute. And he's smart
这里是 xml 数据的一部分,然后再继续
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="episode1">
<media>
<video>
<track>
<generatoritem id="Gen Subtitle1">
<effect>
<name>Gen Subtitle</name>
<effectid>Gen Subtitle</effectid>
<effectcategory>Text</effectcategory>
<effecttype>generator</effecttype>
<mediatype>video</mediatype>
<parameter>
<parameterid>part1</parameterid>
<name>Text Settings</name>
<value/>
</parameter>
<parameter>
<parameterid>str</parameterid>
<name>Text</name>
<value>You're a coward for picking on people who are weaker than you.</value>
</parameter>
<parameter>
<parameterid>font</parameterid>
<name>Font</name>
<value>Arial</value>
</parameter>
</effect>
</media>
</sequence>
</xmeml>
现在你可以看到树从 <effect>
开始,里面有多个 <parameters>
但我只从 <parameters>
中删除 <value>
也包含
<parameterid>str</parameterid>
<name>Text</name>
所以我可以获得 "That child is so cute. And he's smart."
的输出这是我的代码
lst = tree.findall('xmeml/sequence/media/video/track/generatoritem/effect/parameter/value')
counts = tree.findall('.//value')
for each in counts:
print(each.text)
这就是我得到的
And he's smart.
Arial
见下文
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="episode1">
<effect>
<name>Gen Subtitle</name>
<effectid>Gen Subtitle</effectid>
<effectcategory>Text</effectcategory>
<effecttype>generator</effecttype>
<mediatype>video</mediatype>
<parameter>
<parameterid>part1</parameterid>
<name>Text Settings</name>
<value/>
</parameter>
<parameter>
<parameterid>str</parameterid>
<name>Text</name>
<value>That child is so cute. And he's smart</value>
</parameter>
<parameter>
<parameterid>font</parameterid>
<name>Font</name>
<value>Arial</value>
</parameter>
</effect>
</sequence>
</xmeml>'''
root = ET.fromstring(xml)
str_params = root.findall('.//parameter/[parameterid="str"]')
for param in str_params:
if param.find('./name').text == 'Text':
print('The text: {}'.format(param.find('./value').text))
break
输出
The text: That child is so cute. And he's smart