使用多个 children 解析 xml
Parsing xml with multiple children
我正在尝试用 python 解析一个 xml 文件,它在 s_amount 行返回一个 AttributeError 'NoneType' object has no attribute 'text'
。 xml 文件有 n 个产品的 product-id、数量、数量和 price-info(示例文件中的三个),理想情况下我想生成一行 table每个产品 ID 和可用时填写的相关列(金额、数量和 price-info)。产品 5340958 将有多个行,因为它有多个数量。
xml 文件提取:
<?xml version="1.0" encoding="UTF-8"?>
<pricebooks xmlns="http:...">
<pricebook>
<header pricebook-id="IT21">
<currency>EUR</currency>
</header>
<price-tables>
<price-table product-id="16780001">
<amount quantity="1">15.00</amount>
<price-info>2021</price-info>
</price-table>
<price-table product-id="5340958">
<amount quantity="1">5</amount>
<amount quantity="2">5</amount>
<amount quantity="3">50</amount>
</price-table>
<price-table product-id="864564543">
<amount quantity="1">60</amount>
</price-table>
</price-tables>
</pricebook>
</pricebooks>
当前 python 脚本:
import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse("C:...ITfile.xml")
xroot = xtree.getroot()
df_cols = ["product-id", "quantity", "amount", "price-info"]
rows = []
for pricebook in xroot:
for element in pricebook[1:]:
for pricetable in element:
s_product_id = pricetable.attrib.get("product-id")
rows.append({"product-id": s_product_id})
for item in pricetable:
s_quantity = item.attrib.get("quantity")
rows.append({"quantity": s_quantity})
s_amount = item.find("amount").text
rows.append({"amount": s_amount})
out_df = pd.DataFrame(rows, columns = df_cols)
您可以使用此代码段将数据从 XML 文件解析为 DataFrame:
import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse("file.xml")
namespaces = {'myprefix':'http:...'} # <-- this is from <pricebooks xmlns="http:...">
df_cols = ["product-id", "quantity", "amount", "price-info"]
rows = []
for row in xtree.findall('.//myprefix:price-table[@product-id]', namespaces):
price_info = row.find('.//myprefix:price-info', namespaces)
if not price_info is None:
price_info = price_info.text
for amount in row.findall('.//myprefix:amount[@quantity]', namespaces):
rows.append(dict(zip(df_cols, [row.attrib.get('product-id'), amount.attrib.get('quantity'), amount.text, price_info])))
df = pd.DataFrame(rows)
print(df)
打印:
product-id quantity amount price-info
0 16780001 1 15.00 2021
1 5340958 1 5 None
2 5340958 2 5 None
3 5340958 3 50 None
4 864564543 1 60 None
我正在尝试用 python 解析一个 xml 文件,它在 s_amount 行返回一个 AttributeError 'NoneType' object has no attribute 'text'
。 xml 文件有 n 个产品的 product-id、数量、数量和 price-info(示例文件中的三个),理想情况下我想生成一行 table每个产品 ID 和可用时填写的相关列(金额、数量和 price-info)。产品 5340958 将有多个行,因为它有多个数量。
xml 文件提取:
<?xml version="1.0" encoding="UTF-8"?>
<pricebooks xmlns="http:...">
<pricebook>
<header pricebook-id="IT21">
<currency>EUR</currency>
</header>
<price-tables>
<price-table product-id="16780001">
<amount quantity="1">15.00</amount>
<price-info>2021</price-info>
</price-table>
<price-table product-id="5340958">
<amount quantity="1">5</amount>
<amount quantity="2">5</amount>
<amount quantity="3">50</amount>
</price-table>
<price-table product-id="864564543">
<amount quantity="1">60</amount>
</price-table>
</price-tables>
</pricebook>
</pricebooks>
当前 python 脚本:
import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse("C:...ITfile.xml")
xroot = xtree.getroot()
df_cols = ["product-id", "quantity", "amount", "price-info"]
rows = []
for pricebook in xroot:
for element in pricebook[1:]:
for pricetable in element:
s_product_id = pricetable.attrib.get("product-id")
rows.append({"product-id": s_product_id})
for item in pricetable:
s_quantity = item.attrib.get("quantity")
rows.append({"quantity": s_quantity})
s_amount = item.find("amount").text
rows.append({"amount": s_amount})
out_df = pd.DataFrame(rows, columns = df_cols)
您可以使用此代码段将数据从 XML 文件解析为 DataFrame:
import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse("file.xml")
namespaces = {'myprefix':'http:...'} # <-- this is from <pricebooks xmlns="http:...">
df_cols = ["product-id", "quantity", "amount", "price-info"]
rows = []
for row in xtree.findall('.//myprefix:price-table[@product-id]', namespaces):
price_info = row.find('.//myprefix:price-info', namespaces)
if not price_info is None:
price_info = price_info.text
for amount in row.findall('.//myprefix:amount[@quantity]', namespaces):
rows.append(dict(zip(df_cols, [row.attrib.get('product-id'), amount.attrib.get('quantity'), amount.text, price_info])))
df = pd.DataFrame(rows)
print(df)
打印:
product-id quantity amount price-info
0 16780001 1 15.00 2021
1 5340958 1 5 None
2 5340958 2 5 None
3 5340958 3 50 None
4 864564543 1 60 None