在 XML 文件中查找嵌套标签并将其转换为数据框 Python

Find Nested Tags in a XML file and Convert it in Data frame Python

我是解析 XML 数据的新手,我正在尝试解析此嵌套标签数据,但遇到了一些问题。帮助将不胜感激。

我有这个数据:

<Items>
    <Item MaintenanceType="A">
        <ItemLevelGTIN GTINQualifier="UP">0006582</ItemLevelGTIN>
        <PartNumber>VRX42</PartNumber>
        <BrandAAIAID>JHHK</BrandAAIAID>
        <PartTerminologyID>1896</PartTerminologyID>

        <Descriptions>
            <Description MaintenanceType="A" DescriptionCode="MKT" LanguageCode="EN">Bentley Brake Rotors designs,
            </Description>
        </Descriptions>

        <ExtendedInformation>
            <ExtendedProductInformation MaintenanceType="A" LanguageCode="EN" EXPICode="CTO">Germany</ExtendedProductInformation>
        </ExtendedInformation>
        
        <ProductAttributes/>
        
        <Packages>
            <Package MaintenanceType="A">
            <PackageUOM>EA</PackageUOM>
            <QuantityofEaches>1</QuantityofEaches>

            <Dimensions UOM="IN">
                <Length>17.1300</Length>
                <Width>15.7400</Width>
                <Height>3.5400</Height>
            </Dimensions>
        
        
            <Weights UOM="lb">
                <Weight>25.1000</Weight>
            </Weights>
        
            </Package>
        </Packages>
        
        <DigitalAssets>
            <DigitalFileInformation MaintenanceType="A" LanguageCode="EN">
            <FileName>VRX47K.PNG</FileName>
            <AssetType>P04</AssetType>
            <FileSize>500061</FileSize>
            <AssetDimensions UOM="PX">
                <AssetHeight>499</AssetHeight>
                <AssetWidth>512</AssetWidth>
            </AssetDimensions>
        
            <FileDateModified>2021-09-02</FileDateModified>
        
            <URI>https://www.asapnetwork.org</URI>
        
            </DigitalFileInformation>
        
        </DigitalAssets>
    
    </Item>
</items>

我想从每个标签中获取文本信息。我尝试了以下代码,但它抛出了这个错误。有人可以帮忙吗?

我的代码如下:

import xml.etree.ElementTree as ETree
import pandas as pd

xmldata = "0H-2021-09-15-Pies.xml"

prstree = ETree.parse(xmldata)
root = prstree.getroot()

store_items = []
all_items = []

cols = ["ItemLevelGTIN", "PartNumber", "BrandAAIAID", "PartTerminologyID", "Description","ExtendedProductInformation", \
        "PackageUOM", "QuantityofEaches", "Length", "Width", "Height", "Weight", "FileName", "AssetType", "FileSize", \
        "AssetHeight", "AssetWidth", "FileDateModified", "URI"]


for child in root.iter('Items'):
    children = child.findall('Item')
    for elem in children:
        ItemLevelGTIN = elem.find("ItemLevelGTIN").text
        PartNumber = elem.find("PartNumber").text
        BrandAAIAID = elem.find("BrandAAIAID").text
        PartTerminologyID = elem.find("PartTerminologyID").text
        Description = elem.find("Description").text
        ExtendedProductInformation = elem.find("ExtendedProductInformation").text
        PackageUOM = elem.find("PackageUOM").text
        QuantityofEaches = elem.find("QuantityofEaches").text
        Length = elem.find("Length").text
        Width = elem.find("Width").text
        Height = elem.find("Height").text
        FileName = elem.find("FileName").text
        AssetType = elem.find("AssetType").text
        FileSize = elem.find("FileSize").text
        AssetHeight = elem.find("AssetHeight").text
        AssetWidth = elem.find("AssetWidth").text
        FileDateModified = elem.find("FileDateModified").text
        URI = elem.find("URI").text
        
        
        store_items = [ItemLevelGTIN, PartNumber, BrandAAIAID, PartTerminologyID,ExtendedProductInformation,Description,\
                      ExtendedProductInformation,PackageUOM,QuantityofEaches, Length, Width, Height, FileName, AssetType,\
                      FileSize, AssetHeight, AssetWidth, FileDateModified,URI ]

        all_items.append(store_items)

xmlToDf = pd.DataFrame(all_items, columns=cols)
print(xmlToDf.to_string(index=True)) 

报错如下:

AttributeError                            Traceback (most recent call last)
<ipython-input-3-53b0d91d646f> in <module>
     22         BrandAAIAID = elem.find("BrandAAIAID").text
     23         PartTerminologyID = elem.find("PartTerminologyID").text
---> 24         Description = elem.find("Description").text
     25         ExtendedProductInformation = elem.find("ExtendedProductInformation").text
     26         PackageUOM = elem.find("PackageUOM").text

AttributeError: 'NoneType' object has no attribute 'text'

请注意,您正在尝试获取 child 的 child 标签(嵌套 xml)

尝试使用-

Description = elem.find("Descriptions")[0].text

首先是parent(描述),然后是他的child(描述)。

请注意,此问题发生在您的代码中的几个地方,因此您也需要修复其他标记。

编辑:

你可以试试这个:

Description = elem.find("Descriptions").find("Description").text