如何在 python/pandas 的帮助下显示 xml 文件的多个父节点?
How do I show multiple parent nodes of xml file with the help of python/pandas?
我是 python 的新手,正在寻找以下问题的解决方案:
我有一个 file.xml 看起来像这样:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimited zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>green</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>14</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>blue dog w no tail</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>blue dog w no tail and unlimited zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>blue</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Happiness Levels</FNAME>
<FVALUE>11/10</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
这是我的代码:
from lxml import etree as et
import pandas as pd
xml_data = et.parse('file2.xml')
products = xml_data.xpath('//HEADER')
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(xml_data.xpath('//HEADER[1]//FNAME/text()'))
rows = []
for product in products:
row = [product.xpath(f'.//{headers[0]}/text()')[0],product.xpath(f'.//{headers[1]}/text()')[0]]
f_values = product.xpath('.//FVALUE/text()')
row.extend(f_values)
rows.append(row)
df = pd.DataFrame(rows,columns=headers)
df
# df.to_csv("File2_Export_V1.csv", index=False)
这是我想要的输出:
DESCRIPTION_SHORT DESCRIPTION_LONG Colour Legs Happiness Levels
0 green cat w short hair green cat w short hair and unlimited zoomies green 14
1 blue dog w no tail blue dog w no tail and unlimited zoomies blue 11/10
我解决这个问题的尝试是像这样扩展一行:
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*'),('//HEADER[2]//PRODUCT_DETAILS//*')]
遗憾的是,我收到语法错误且没有解决方案。
如何调整我的代码以反映 xml 结构?
提前致谢! ~C
可能不是最好的解决方案,但我认为它很漂亮 straight-forward,而且清晰。
import xml.etree.ElementTree as ET
import pandas as pd
# Get xml object
tree = ET.parse('file2.xml')
root = tree.getroot()
# Create final DataFrame
out = pd.DataFrame()
# Loop over all products (Product = (DETAILS, FEATURES))
for i in range(0, len(root), 2):
# Get all descriptions
descriptions = [(child.tag, child.text) for child in root[i]]
# Get all features
features = [(child[0].text, child[1].text) for child in root[i + 1]]
# Create a DataFrame, where columns are the tags, and values are, well, values
temp_df = pd.DataFrame([[i[1] for i in descriptions + features]], columns=[i[0] for i in descriptions + features])
# Append to final DataFrame
out = pd.concat([out, temp_df])
我是 python 的新手,正在寻找以下问题的解决方案:
我有一个 file.xml 看起来像这样:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimited zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>green</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>14</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>blue dog w no tail</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>blue dog w no tail and unlimited zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>blue</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Happiness Levels</FNAME>
<FVALUE>11/10</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
这是我的代码:
from lxml import etree as et
import pandas as pd
xml_data = et.parse('file2.xml')
products = xml_data.xpath('//HEADER')
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(xml_data.xpath('//HEADER[1]//FNAME/text()'))
rows = []
for product in products:
row = [product.xpath(f'.//{headers[0]}/text()')[0],product.xpath(f'.//{headers[1]}/text()')[0]]
f_values = product.xpath('.//FVALUE/text()')
row.extend(f_values)
rows.append(row)
df = pd.DataFrame(rows,columns=headers)
df
# df.to_csv("File2_Export_V1.csv", index=False)
这是我想要的输出:
DESCRIPTION_SHORT DESCRIPTION_LONG Colour Legs Happiness Levels
0 green cat w short hair green cat w short hair and unlimited zoomies green 14
1 blue dog w no tail blue dog w no tail and unlimited zoomies blue 11/10
我解决这个问题的尝试是像这样扩展一行:
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*'),('//HEADER[2]//PRODUCT_DETAILS//*')]
遗憾的是,我收到语法错误且没有解决方案。
如何调整我的代码以反映 xml 结构?
提前致谢! ~C
可能不是最好的解决方案,但我认为它很漂亮 straight-forward,而且清晰。
import xml.etree.ElementTree as ET
import pandas as pd
# Get xml object
tree = ET.parse('file2.xml')
root = tree.getroot()
# Create final DataFrame
out = pd.DataFrame()
# Loop over all products (Product = (DETAILS, FEATURES))
for i in range(0, len(root), 2):
# Get all descriptions
descriptions = [(child.tag, child.text) for child in root[i]]
# Get all features
features = [(child[0].text, child[1].text) for child in root[i + 1]]
# Create a DataFrame, where columns are the tags, and values are, well, values
temp_df = pd.DataFrame([[i[1] for i in descriptions + features]], columns=[i[0] for i in descriptions + features])
# Append to final DataFrame
out = pd.concat([out, temp_df])