pandas & xml - 如何显示嵌套不同的标签文本?
pandas & xml - How to show text of tags that are differently nested?
我正在寻找有关如何正确显示此内容的一些见解 xml:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT>
<SUPPLIER>015</SUPPLIER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<REFERENCE>Class01</REFERENCE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</PRODUCT>
</HEADER>
对于更简单的结构,它看起来像这样:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
我写了几行,如下所示:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('file.xml')
root = tree.getroot()
df = pd.DataFrame()
for i in range(0, len(root), 5):
details = [(child.tag, child.text) for child in root[i + 0]]
features = [(child[0].text, child[1].text) for child in root[i + 1]]
temp_df = pd.DataFrame([[i[1] for i in details + features]], columns=[i[0] for i in details + features])
df = pd.concat([df, temp_df])
df
# df.to_csv("file_export.csv", index=False)
... 并产生此输出:
KEYWORD PRODUCT_TYPE Colour
0 Paper major white
我需要进行哪些编辑才能输出:
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white
感谢您的帮助!
最好,
~C
这是一种方法:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file.xml")
root = tree.getroot()
data = {
"SUPPLIER": [],
"KEYWORD": [],
"PRODUCT_TYPE": [],
"REFERENCE": [],
"Colour": [],
}
for product in root:
data["SUPPLIER"].append(product[0].text)
data["KEYWORD"].append(product[1][0].text)
data["PRODUCT_TYPE"].append(product[1][1].text)
data["REFERENCE"].append(product[2][0].text)
data["Colour"].append(product[2][1][1].text)
df = pd.DataFrame(data)
print(df)
# Output
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white
我正在寻找有关如何正确显示此内容的一些见解 xml:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT>
<SUPPLIER>015</SUPPLIER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<REFERENCE>Class01</REFERENCE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</PRODUCT>
</HEADER>
对于更简单的结构,它看起来像这样:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
我写了几行,如下所示:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('file.xml')
root = tree.getroot()
df = pd.DataFrame()
for i in range(0, len(root), 5):
details = [(child.tag, child.text) for child in root[i + 0]]
features = [(child[0].text, child[1].text) for child in root[i + 1]]
temp_df = pd.DataFrame([[i[1] for i in details + features]], columns=[i[0] for i in details + features])
df = pd.concat([df, temp_df])
df
# df.to_csv("file_export.csv", index=False)
... 并产生此输出:
KEYWORD PRODUCT_TYPE Colour
0 Paper major white
我需要进行哪些编辑才能输出:
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white
感谢您的帮助!
最好, ~C
这是一种方法:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file.xml")
root = tree.getroot()
data = {
"SUPPLIER": [],
"KEYWORD": [],
"PRODUCT_TYPE": [],
"REFERENCE": [],
"Colour": [],
}
for product in root:
data["SUPPLIER"].append(product[0].text)
data["KEYWORD"].append(product[1][0].text)
data["PRODUCT_TYPE"].append(product[1][1].text)
data["REFERENCE"].append(product[2][0].text)
data["Colour"].append(product[2][1][1].text)
df = pd.DataFrame(data)
print(df)
# Output
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white