嵌套的 xml 属性和文本不显示在使用 pandas 的 df 中
Nested xml attributes & text don't show in df using pandas
我是 Python 的新手并且有一个 file.xml 具有以下结构:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>blue dog w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>blue dog w short hair and unlimitied zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>short</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>blue</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>4</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
我正在使用一个非常简单的片段(下方)将其转换为 file_export.csv:
import pandas as pd
df = pd.read_xml("file.xml")
# df
df.to_csv("file_export.csv", index=False)
问题是我最终得到了这样的 table:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN
我尝试删除 FEATURE 属性,但最终用最后一个覆盖(?)之前的 FNAME 和 FVALUE,假设因为它们被称为相同的:
DESCRIPTION_SHORT DESCRIPTION_LONG FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies None NaN
None None Legs 4.0
我需要在我的代码中添加什么才能显示嵌套属性(包括它们的文本)?像这样:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Hair short
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Colour blue
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Legs 4
提前致谢!!
~ C
首先,您问题中的示例 xml(可能还有您的实际 xml)并不真正适合 read_xml()
。在这种情况下,您最好使用实际的 xml 解析器并将输出移交给 pandas.
此外,我认为您想要的输出效率不高 - 在您的示例中,您无缘无故地将每个长描述和短描述重复了 3 次。
说了这么多,我建议是这样的:
假设您实际 xml 有不止一只宠物,例如:
inventory="""<?xml version="1.0" encoding="UTF-8"?>
<doc>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimitied zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>medium</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>green</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>14</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
****the HEADER in your question goes here***
</doc>"""
from lxml import etree
import pandas as pd
doc = etree.XML(inventory.encode())
pets = doc.xpath('//HEADER')
headers=[elem.tag for elem in doc.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(doc.xpath('//HEADER[1]//FNAME/text()'))
rows = []
for pet in pets:
row = [pet.xpath(f'.//{headers[0]}/text()')[0],pet.xpath(f'.//{headers[1]}/text()')[0]]
f_values = pet.xpath('.//FVALUE/text()')
row.extend(f_values)
rows.append(row)
如果您想更大胆地使用 xpath 2.0(lxml 不支持)以及更多列表理解,您可以试试这个:
from elementpath import select
expression1 = '//HEADER[1]/string-join((./PRODUCT_DETAILS//*/name(),./PRODUCT_FEATURES//FNAME),",")'
expression2 = '//HEADER/string-join((./PRODUCT_DETAILS//*,./PRODUCT_FEATURES//FVALUE),",")'
headers = [h.split(',') for h in select(doc, expression1 )]
rows= [r.split(',') for r in select(doc, expression2)]
无论哪种情况:
pd.DataFrame(rows,columns=headers)
应该输出:
DESCRIPTION_SHORT DESCRIPTION_LONG Hair Colour Legs
0 green cat w short hair green cat w short hair and unlimitied zoomies medium green 14
1 blue dog w long hair blue dog w long hair and limitied zoomies short blue 4
我是 Python 的新手并且有一个 file.xml 具有以下结构:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>blue dog w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>blue dog w short hair and unlimitied zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>short</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>blue</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>4</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
我正在使用一个非常简单的片段(下方)将其转换为 file_export.csv:
import pandas as pd
df = pd.read_xml("file.xml")
# df
df.to_csv("file_export.csv", index=False)
问题是我最终得到了这样的 table:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN
我尝试删除 FEATURE 属性,但最终用最后一个覆盖(?)之前的 FNAME 和 FVALUE,假设因为它们被称为相同的:
DESCRIPTION_SHORT DESCRIPTION_LONG FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies None NaN
None None Legs 4.0
我需要在我的代码中添加什么才能显示嵌套属性(包括它们的文本)?像这样:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Hair short
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Colour blue
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Legs 4
提前致谢!!
~ C
首先,您问题中的示例 xml(可能还有您的实际 xml)并不真正适合 read_xml()
。在这种情况下,您最好使用实际的 xml 解析器并将输出移交给 pandas.
此外,我认为您想要的输出效率不高 - 在您的示例中,您无缘无故地将每个长描述和短描述重复了 3 次。
说了这么多,我建议是这样的:
假设您实际 xml 有不止一只宠物,例如:
inventory="""<?xml version="1.0" encoding="UTF-8"?>
<doc>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimitied zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>medium</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>green</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>14</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
****the HEADER in your question goes here***
</doc>"""
from lxml import etree
import pandas as pd
doc = etree.XML(inventory.encode())
pets = doc.xpath('//HEADER')
headers=[elem.tag for elem in doc.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(doc.xpath('//HEADER[1]//FNAME/text()'))
rows = []
for pet in pets:
row = [pet.xpath(f'.//{headers[0]}/text()')[0],pet.xpath(f'.//{headers[1]}/text()')[0]]
f_values = pet.xpath('.//FVALUE/text()')
row.extend(f_values)
rows.append(row)
如果您想更大胆地使用 xpath 2.0(lxml 不支持)以及更多列表理解,您可以试试这个:
from elementpath import select
expression1 = '//HEADER[1]/string-join((./PRODUCT_DETAILS//*/name(),./PRODUCT_FEATURES//FNAME),",")'
expression2 = '//HEADER/string-join((./PRODUCT_DETAILS//*,./PRODUCT_FEATURES//FVALUE),",")'
headers = [h.split(',') for h in select(doc, expression1 )]
rows= [r.split(',') for r in select(doc, expression2)]
无论哪种情况:
pd.DataFrame(rows,columns=headers)
应该输出:
DESCRIPTION_SHORT DESCRIPTION_LONG Hair Colour Legs
0 green cat w short hair green cat w short hair and unlimitied zoomies medium green 14
1 blue dog w long hair blue dog w long hair and limitied zoomies short blue 4