我在如何解析多个 xml 文件并将其作为 Python 中的数据帧进行处理时遇到麻烦

Question

我想将多个 xml 文件解析为数据框。有相同的xpath。

我使用了元素树，os Python library.It 可以解析所有文件，但它打印出空数据框。但是如果没有多个文件的代码，它可以正常工作。

mypath = r'C:\Users\testFile'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

for file in files:
    xtree = et.parse(file)
    xroot = xtree.getroot()
    df_cols=['value']
    out_xml=pd.DataFrame(columns=df_cols)
    for node in xroot.findall(r'./Group[1]/Details/Section[3]/Subreport/Group/Group[1]/Details/Section/Field'):
        name = node.attrib.get('Name')
        value = node.find('Value').text
        out_xml = out_xml.append(pd.Series([value],index=df_cols),ignore_index=True)
    df = pd.DataFrame(np.reshape(out_xml.values, (-1, 4)))

Answer 1

如果你需要一个包含所有数据的数据帧，你需要将每个数据帧连接到一个主数据帧

mypath = r'C:\testFile'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

mainDF = pd.DataFrame()
for file in files:
    xtree = et.parse(file)
    xroot = xtree.getroot()
    df_cols=['value']
    out_xml=pd.DataFrame(columns=df_cols)
    for node in xroot.findall(r'./Group[1]/Details/Section[3]/Subreport/Group/Group[1]/Details/Section/Field'):
        name = node.attrib.get('Name')
        value = node.find('Value').text
        out_xml = out_xml.append(pd.Series([value],index=df_cols),ignore_index=True)
    df = pd.DataFrame(np.reshape(out_xml.values, (-1, 4)))
    mainDF = pd.concat([mainDF,df])
 mainDF.to_csv("filename.csv")

我在如何解析多个 xml 文件并将其作为 Python 中的数据帧进行处理时遇到麻烦

I trouble in how do parse multiple xml file and process it as dataframe in Python

python

xml

elementtree