在 XML 文件中查找嵌套标签并将其转换为数据框 Python
Find Nested Tags in a XML file and Convert it in Data frame Python
我是解析 XML 数据的新手,我正在尝试解析此嵌套标签数据,但遇到了一些问题。帮助将不胜感激。
我有这个数据:
<Items>
<Item MaintenanceType="A">
<ItemLevelGTIN GTINQualifier="UP">0006582</ItemLevelGTIN>
<PartNumber>VRX42</PartNumber>
<BrandAAIAID>JHHK</BrandAAIAID>
<PartTerminologyID>1896</PartTerminologyID>
<Descriptions>
<Description MaintenanceType="A" DescriptionCode="MKT" LanguageCode="EN">Bentley Brake Rotors designs,
</Description>
</Descriptions>
<ExtendedInformation>
<ExtendedProductInformation MaintenanceType="A" LanguageCode="EN" EXPICode="CTO">Germany</ExtendedProductInformation>
</ExtendedInformation>
<ProductAttributes/>
<Packages>
<Package MaintenanceType="A">
<PackageUOM>EA</PackageUOM>
<QuantityofEaches>1</QuantityofEaches>
<Dimensions UOM="IN">
<Length>17.1300</Length>
<Width>15.7400</Width>
<Height>3.5400</Height>
</Dimensions>
<Weights UOM="lb">
<Weight>25.1000</Weight>
</Weights>
</Package>
</Packages>
<DigitalAssets>
<DigitalFileInformation MaintenanceType="A" LanguageCode="EN">
<FileName>VRX47K.PNG</FileName>
<AssetType>P04</AssetType>
<FileSize>500061</FileSize>
<AssetDimensions UOM="PX">
<AssetHeight>499</AssetHeight>
<AssetWidth>512</AssetWidth>
</AssetDimensions>
<FileDateModified>2021-09-02</FileDateModified>
<URI>https://www.asapnetwork.org</URI>
</DigitalFileInformation>
</DigitalAssets>
</Item>
</items>
我想从每个标签中获取文本信息。我尝试了以下代码,但它抛出了这个错误。有人可以帮忙吗?
我的代码如下:
import xml.etree.ElementTree as ETree
import pandas as pd
xmldata = "0H-2021-09-15-Pies.xml"
prstree = ETree.parse(xmldata)
root = prstree.getroot()
store_items = []
all_items = []
cols = ["ItemLevelGTIN", "PartNumber", "BrandAAIAID", "PartTerminologyID", "Description","ExtendedProductInformation", \
"PackageUOM", "QuantityofEaches", "Length", "Width", "Height", "Weight", "FileName", "AssetType", "FileSize", \
"AssetHeight", "AssetWidth", "FileDateModified", "URI"]
for child in root.iter('Items'):
children = child.findall('Item')
for elem in children:
ItemLevelGTIN = elem.find("ItemLevelGTIN").text
PartNumber = elem.find("PartNumber").text
BrandAAIAID = elem.find("BrandAAIAID").text
PartTerminologyID = elem.find("PartTerminologyID").text
Description = elem.find("Description").text
ExtendedProductInformation = elem.find("ExtendedProductInformation").text
PackageUOM = elem.find("PackageUOM").text
QuantityofEaches = elem.find("QuantityofEaches").text
Length = elem.find("Length").text
Width = elem.find("Width").text
Height = elem.find("Height").text
FileName = elem.find("FileName").text
AssetType = elem.find("AssetType").text
FileSize = elem.find("FileSize").text
AssetHeight = elem.find("AssetHeight").text
AssetWidth = elem.find("AssetWidth").text
FileDateModified = elem.find("FileDateModified").text
URI = elem.find("URI").text
store_items = [ItemLevelGTIN, PartNumber, BrandAAIAID, PartTerminologyID,ExtendedProductInformation,Description,\
ExtendedProductInformation,PackageUOM,QuantityofEaches, Length, Width, Height, FileName, AssetType,\
FileSize, AssetHeight, AssetWidth, FileDateModified,URI ]
all_items.append(store_items)
xmlToDf = pd.DataFrame(all_items, columns=cols)
print(xmlToDf.to_string(index=True))
报错如下:
AttributeError Traceback (most recent call last)
<ipython-input-3-53b0d91d646f> in <module>
22 BrandAAIAID = elem.find("BrandAAIAID").text
23 PartTerminologyID = elem.find("PartTerminologyID").text
---> 24 Description = elem.find("Description").text
25 ExtendedProductInformation = elem.find("ExtendedProductInformation").text
26 PackageUOM = elem.find("PackageUOM").text
AttributeError: 'NoneType' object has no attribute 'text'
请注意,您正在尝试获取 child 的 child 标签(嵌套 xml)
尝试使用-
Description = elem.find("Descriptions")[0].text
首先是parent(描述),然后是他的child(描述)。
请注意,此问题发生在您的代码中的几个地方,因此您也需要修复其他标记。
编辑:
你可以试试这个:
Description = elem.find("Descriptions").find("Description").text
我是解析 XML 数据的新手,我正在尝试解析此嵌套标签数据,但遇到了一些问题。帮助将不胜感激。
我有这个数据:
<Items>
<Item MaintenanceType="A">
<ItemLevelGTIN GTINQualifier="UP">0006582</ItemLevelGTIN>
<PartNumber>VRX42</PartNumber>
<BrandAAIAID>JHHK</BrandAAIAID>
<PartTerminologyID>1896</PartTerminologyID>
<Descriptions>
<Description MaintenanceType="A" DescriptionCode="MKT" LanguageCode="EN">Bentley Brake Rotors designs,
</Description>
</Descriptions>
<ExtendedInformation>
<ExtendedProductInformation MaintenanceType="A" LanguageCode="EN" EXPICode="CTO">Germany</ExtendedProductInformation>
</ExtendedInformation>
<ProductAttributes/>
<Packages>
<Package MaintenanceType="A">
<PackageUOM>EA</PackageUOM>
<QuantityofEaches>1</QuantityofEaches>
<Dimensions UOM="IN">
<Length>17.1300</Length>
<Width>15.7400</Width>
<Height>3.5400</Height>
</Dimensions>
<Weights UOM="lb">
<Weight>25.1000</Weight>
</Weights>
</Package>
</Packages>
<DigitalAssets>
<DigitalFileInformation MaintenanceType="A" LanguageCode="EN">
<FileName>VRX47K.PNG</FileName>
<AssetType>P04</AssetType>
<FileSize>500061</FileSize>
<AssetDimensions UOM="PX">
<AssetHeight>499</AssetHeight>
<AssetWidth>512</AssetWidth>
</AssetDimensions>
<FileDateModified>2021-09-02</FileDateModified>
<URI>https://www.asapnetwork.org</URI>
</DigitalFileInformation>
</DigitalAssets>
</Item>
</items>
我想从每个标签中获取文本信息。我尝试了以下代码,但它抛出了这个错误。有人可以帮忙吗?
我的代码如下:
import xml.etree.ElementTree as ETree
import pandas as pd
xmldata = "0H-2021-09-15-Pies.xml"
prstree = ETree.parse(xmldata)
root = prstree.getroot()
store_items = []
all_items = []
cols = ["ItemLevelGTIN", "PartNumber", "BrandAAIAID", "PartTerminologyID", "Description","ExtendedProductInformation", \
"PackageUOM", "QuantityofEaches", "Length", "Width", "Height", "Weight", "FileName", "AssetType", "FileSize", \
"AssetHeight", "AssetWidth", "FileDateModified", "URI"]
for child in root.iter('Items'):
children = child.findall('Item')
for elem in children:
ItemLevelGTIN = elem.find("ItemLevelGTIN").text
PartNumber = elem.find("PartNumber").text
BrandAAIAID = elem.find("BrandAAIAID").text
PartTerminologyID = elem.find("PartTerminologyID").text
Description = elem.find("Description").text
ExtendedProductInformation = elem.find("ExtendedProductInformation").text
PackageUOM = elem.find("PackageUOM").text
QuantityofEaches = elem.find("QuantityofEaches").text
Length = elem.find("Length").text
Width = elem.find("Width").text
Height = elem.find("Height").text
FileName = elem.find("FileName").text
AssetType = elem.find("AssetType").text
FileSize = elem.find("FileSize").text
AssetHeight = elem.find("AssetHeight").text
AssetWidth = elem.find("AssetWidth").text
FileDateModified = elem.find("FileDateModified").text
URI = elem.find("URI").text
store_items = [ItemLevelGTIN, PartNumber, BrandAAIAID, PartTerminologyID,ExtendedProductInformation,Description,\
ExtendedProductInformation,PackageUOM,QuantityofEaches, Length, Width, Height, FileName, AssetType,\
FileSize, AssetHeight, AssetWidth, FileDateModified,URI ]
all_items.append(store_items)
xmlToDf = pd.DataFrame(all_items, columns=cols)
print(xmlToDf.to_string(index=True))
报错如下:
AttributeError Traceback (most recent call last)
<ipython-input-3-53b0d91d646f> in <module>
22 BrandAAIAID = elem.find("BrandAAIAID").text
23 PartTerminologyID = elem.find("PartTerminologyID").text
---> 24 Description = elem.find("Description").text
25 ExtendedProductInformation = elem.find("ExtendedProductInformation").text
26 PackageUOM = elem.find("PackageUOM").text
AttributeError: 'NoneType' object has no attribute 'text'
请注意,您正在尝试获取 child 的 child 标签(嵌套 xml)
尝试使用-
Description = elem.find("Descriptions")[0].text
首先是parent(描述),然后是他的child(描述)。
请注意,此问题发生在您的代码中的几个地方,因此您也需要修复其他标记。
编辑:
你可以试试这个:
Description = elem.find("Descriptions").find("Description").text