Python ElementTree 无法按顺序循环 XML 标签
Python ElementTree having trouble looping through XML Tags in order
希望有人能帮助解决这个问题。我刚开始使用 Python 和 XML,我遇到了一个我似乎无法弄清楚的障碍
我正在尝试解析下面的 XML 我似乎无法让它保持在 SEG-LIN 标签
中两部分的顺序
<DATA>
<LOOP-LIN>
<LOOP-INFO name="LIN Loop" />
<SEG-LIN>
<DE code="0234" name="PRODUCT/SERVICE ID" type="AN">PartA</DE>
</SEG-LIN>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">6400</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201123</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">8000</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201125</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">6400</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201130</DE>
</SEG-FST>
</LOOP-FST>
</LOOP-LIN>
<LOOP-LIN>
<LOOP-INFO name="LIN Loop" />
<SEG-LIN>
<DE code="0234" name="PRODUCT/SERVICE ID" type="AN">PartB</DE>
</SEG-LIN>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">600</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201123</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">700</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201130</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">900</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201203</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">1000</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201207</DE>
</SEG-FST>
</LOOP-FST>
</LOOP-LIN>
</DATA>
这是我试过的Python代码
import xml.etree.ElementTree as ET
import pandas as pd
fst_qty = []
fst_type = []
fst_date = []
tree = ET.parse('File.xml')
root = tree.getroot()
list1 = root.findall('LOOP-LIN')
for y in list1:
part=(y.find('SEG-LIN/DE[@code="0234"]').text)
fst_type.append(y.find('LOOP-FST/SEG-FST/DE[@code="0680"]').text)
fst_date.append(y.find('LOOP-FST/SEG-FST/DE[@code="0373"]').text)
fst_qty.append(part+'+'+y.find('LOOP-FST/SEG-FST/DE[@code="0380"]').text)
df = pd.DataFrame({'Qty':fst_qty,'Date':fst_date,'Type':fst_type})
print(df)
What I get is:
Qty Date Type
0 PartA+6400 20201123 C
1 PartB+600 20201123 C
它正确地找到了每个的第一个实例,但我无法弄清楚如何让它从每个 LOOP-LIN 中提取其余的 SEG-FST 数据并保持部件号的正确顺序。我已经尝试了很多事情,但我要么以一个循环结束,该循环两次传递数据并向 PartA 添加所有 SEG-FST 数据的完整列表,然后对 PartB 执行相同的操作,或者每个只做一个。
What I am trying to get is
QTY DATE TYPE
PartA+6400 20201123 C
PartA+8000 20201125 C
PartA+6400 20201130 C
PartB+600 20201123 C
PartB+700 20201130 C
PartB+900 20201203 C
PartB+1000 20201207 C
希望有人能帮忙
谢谢
问题是 find
方法仅 returns 第一个匹配实例。 findall
方法将改为查找所有匹配的实例,返回匹配列表。然后,您可以使用列表理解来提取文本。然后您还需要扩展而不是附加您的列表:
import xml.etree.ElementTree as ET
import pandas as pd
fst_qty = []
fst_type = []
fst_date = []
tree = ET.parse('File.xml')
root = tree.getroot()
list1 = root.findall('LOOP-LIN')
for y in list1:
part = y.find('SEG-LIN/DE[@code="0234"]').text
fst_type.extend([v.text for v in y.findall('LOOP-FST/SEG-FST/DE[@code="0680"]')])
fst_date.extend([v.text for v in y.findall('LOOP-FST/SEG-FST/DE[@code="0373"]')])
fst_qty.extend([part + '+' + v.text for v in y.findall('LOOP-FST/SEG-FST/DE[@code="0380"]')])
df = pd.DataFrame({'Qty': fst_qty, 'Date': fst_date, 'Type': fst_type})
print(df)
生成:
Qty Date Type
0 PartA+6400 20201123 C
1 PartA+8000 20201125 C
2 PartA+6400 20201130 C
3 PartB+600 20201123 C
4 PartB+700 20201130 C
5 PartB+900 20201203 C
6 PartB+1000 20201207 C
希望有人能帮助解决这个问题。我刚开始使用 Python 和 XML,我遇到了一个我似乎无法弄清楚的障碍 我正在尝试解析下面的 XML 我似乎无法让它保持在 SEG-LIN 标签
中两部分的顺序<DATA>
<LOOP-LIN>
<LOOP-INFO name="LIN Loop" />
<SEG-LIN>
<DE code="0234" name="PRODUCT/SERVICE ID" type="AN">PartA</DE>
</SEG-LIN>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">6400</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201123</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">8000</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201125</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">6400</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201130</DE>
</SEG-FST>
</LOOP-FST>
</LOOP-LIN>
<LOOP-LIN>
<LOOP-INFO name="LIN Loop" />
<SEG-LIN>
<DE code="0234" name="PRODUCT/SERVICE ID" type="AN">PartB</DE>
</SEG-LIN>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">600</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201123</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">700</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201130</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">900</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201203</DE>
</SEG-FST>
</LOOP-FST>
<LOOP-FST>
<LOOP-INFO name="FST Loop" />
<SEG-FST>
<SEG-INFO code="FST" name="FORECAST SCHEDULE" />
<DE code="0380" name="QUANTITY" type="N">1000</DE>
<DE code="0680" name="FORECAST QUALIFIER" type="ID" desc="Firm">C</DE>
<DE code="0681" name="FORECAST TIMING QUALIFIER" type="ID" desc="Discrete">D</DE>
<DE code="0373" name="DATE" type="DT">20201207</DE>
</SEG-FST>
</LOOP-FST>
</LOOP-LIN>
</DATA>
这是我试过的Python代码
import xml.etree.ElementTree as ET
import pandas as pd
fst_qty = []
fst_type = []
fst_date = []
tree = ET.parse('File.xml')
root = tree.getroot()
list1 = root.findall('LOOP-LIN')
for y in list1:
part=(y.find('SEG-LIN/DE[@code="0234"]').text)
fst_type.append(y.find('LOOP-FST/SEG-FST/DE[@code="0680"]').text)
fst_date.append(y.find('LOOP-FST/SEG-FST/DE[@code="0373"]').text)
fst_qty.append(part+'+'+y.find('LOOP-FST/SEG-FST/DE[@code="0380"]').text)
df = pd.DataFrame({'Qty':fst_qty,'Date':fst_date,'Type':fst_type})
print(df)
What I get is:
Qty Date Type
0 PartA+6400 20201123 C
1 PartB+600 20201123 C
它正确地找到了每个的第一个实例,但我无法弄清楚如何让它从每个 LOOP-LIN 中提取其余的 SEG-FST 数据并保持部件号的正确顺序。我已经尝试了很多事情,但我要么以一个循环结束,该循环两次传递数据并向 PartA 添加所有 SEG-FST 数据的完整列表,然后对 PartB 执行相同的操作,或者每个只做一个。
What I am trying to get is
QTY DATE TYPE
PartA+6400 20201123 C
PartA+8000 20201125 C
PartA+6400 20201130 C
PartB+600 20201123 C
PartB+700 20201130 C
PartB+900 20201203 C
PartB+1000 20201207 C
希望有人能帮忙 谢谢
问题是 find
方法仅 returns 第一个匹配实例。 findall
方法将改为查找所有匹配的实例,返回匹配列表。然后,您可以使用列表理解来提取文本。然后您还需要扩展而不是附加您的列表:
import xml.etree.ElementTree as ET
import pandas as pd
fst_qty = []
fst_type = []
fst_date = []
tree = ET.parse('File.xml')
root = tree.getroot()
list1 = root.findall('LOOP-LIN')
for y in list1:
part = y.find('SEG-LIN/DE[@code="0234"]').text
fst_type.extend([v.text for v in y.findall('LOOP-FST/SEG-FST/DE[@code="0680"]')])
fst_date.extend([v.text for v in y.findall('LOOP-FST/SEG-FST/DE[@code="0373"]')])
fst_qty.extend([part + '+' + v.text for v in y.findall('LOOP-FST/SEG-FST/DE[@code="0380"]')])
df = pd.DataFrame({'Qty': fst_qty, 'Date': fst_date, 'Type': fst_type})
print(df)
生成:
Qty Date Type
0 PartA+6400 20201123 C
1 PartA+8000 20201125 C
2 PartA+6400 20201130 C
3 PartB+600 20201123 C
4 PartB+700 20201130 C
5 PartB+900 20201203 C
6 PartB+1000 20201207 C