pandas & xml - 如何显示嵌套不同的标签文本?

pandas & xml - How to show text of tags that are differently nested?

我正在寻找有关如何正确显示此内容的一些见解 xml:

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT>
        <SUPPLIER>015</SUPPLIER>
        <PRODUCT_DETAILS>
            <KEYWORD>Paper</KEYWORD>
            <PRODUCT_TYPE>major</PRODUCT_TYPE>
        </PRODUCT_DETAILS>
        <PRODUCT_FEATURES>
            <REFERENCE>Class01</REFERENCE>
            <FEATURE>
                <FNAME>Colour</FNAME>
                <FVALUE>white</FVALUE>
            </FEATURE>
        </PRODUCT_FEATURES>
    </PRODUCT>
</HEADER>

对于更简单的结构,它看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT_DETAILS>
        <KEYWORD>Paper</KEYWORD>
        <PRODUCT_TYPE>major</PRODUCT_TYPE>
    </PRODUCT_DETAILS>
    <PRODUCT_FEATURES>
        <FEATURE>
            <FNAME>Colour</FNAME>
            <FVALUE>white</FVALUE>
        </FEATURE>
    </PRODUCT_FEATURES>
</HEADER>

我写了几行,如下所示:

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse('file.xml')
root = tree.getroot()

df = pd.DataFrame()

for i in range(0, len(root), 5):

    details = [(child.tag, child.text) for child in root[i + 0]]
    features = [(child[0].text, child[1].text) for child in root[i + 1]]

    temp_df = pd.DataFrame([[i[1] for i in details + features]], columns=[i[0] for i in details + features])

    df = pd.concat([df, temp_df])

df

# df.to_csv("file_export.csv", index=False)

... 并产生此输出:

    KEYWORD PRODUCT_TYPE    Colour
0   Paper   major           white

我需要进行哪些编辑才能输出:

    SUPPLIER    KEYWORD PRODUCT_TYPE    REFERENCE   Colour
0   015         Paper   major           Class01     white

感谢您的帮助!

最好, ~C

这是一种方法:

import xml.etree.ElementTree as ET

import pandas as pd

tree = ET.parse("file.xml")
root = tree.getroot()

data = {
    "SUPPLIER": [],
    "KEYWORD": [],
    "PRODUCT_TYPE": [],
    "REFERENCE": [],
    "Colour": [],
}

for product in root:
    data["SUPPLIER"].append(product[0].text)
    data["KEYWORD"].append(product[1][0].text)
    data["PRODUCT_TYPE"].append(product[1][1].text)
    data["REFERENCE"].append(product[2][0].text)
    data["Colour"].append(product[2][1][1].text)

df = pd.DataFrame(data)
print(df)
# Output
  SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0      015   Paper        major   Class01  white