Python:使用不同的嵌套元素从 xml 创建 csv
Python: create csv from xml with different nested elements
这是我的 xml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:sdt="urn:oasis:names:specification:ubl:schema:xsd:SpecializedDatatypes-2" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 UBL-Invoice-2.0.xsd">
<cbc:ID>102165444</cbc:ID>
<cac:InvoiceLine>
<cbc:ID>1.0000</cbc:ID>
<cbc:Note />
<cbc:InvoicedQuantity unitCode="CT">1.0000</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="DKK">142.3900</cbc:LineExtensionAmount>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="DKK">138.24</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="DKK">7.20</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">3645</cbc:ID>
<cac:TaxScheme>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">140</cbc:ID>
<cbc:Name>Afgift</cbc:Name>
<cbc:TaxTypeCode listAgencyID="320" listID="urn:oioubl:codelist:taxtypecode-1.1">StandardRated</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
</cac:InvoiceLine>
<cbc:ID>2.0000</cbc:ID>
<cbc:Note />
<cbc:InvoicedQuantity unitCode="CT">1.0000</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="DKK">142.3900</cbc:LineExtensionAmount>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="DKK">138.24</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="DKK">7.20</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">3645</cbc:ID>
<cac:TaxScheme>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">140</cbc:ID>
<cbc:Name>Afgift</cbc:Name>
<cbc:TaxTypeCode listAgencyID="320" listID="urn:oioubl:codelist:taxtypecode-1.1">StandardRated</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="DKK">35.60</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="DKK">35.60</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">StandardRated</cbc:ID>
<cbc:Percent>25</cbc:Percent>
<cac:TaxScheme>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">63</cbc:ID>
<cbc:Name>Moms</cbc:Name>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
</cac:InvoiceLine>
</Invoice>
如您所见,该文件有一个 id,几个 'Invoice lines',每个都有自己的 id 以及其他子元素。
我想要做的是创建一个 csv 文件,每个发票行都有一行,其中包含来自特定嵌套元素的信息。挑战在于每一行都可以有几个 'TaxTotal' 子元素。在那种情况下,我想要另一行包含这样的信息:
ID;/InvoiceLine/ID;InvoiceLine/InvoicedQuantity;/InvoiceLine/LineExtensionAmount;/InvoiceLine/TaxTotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount /InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount ;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name
102165444;1;1;142,39;138,24;142,39;7,20;3645,00;;140;Afgift
102165444;2;1;142,39;138,24;142,39;7,20;3646,00;;140;Afgift
102165444;2;1;142,39;35,60;142,39;35,60;StandardRated;25,00;63;Moms
我该如何完成?
因为总会有至少一个 TaxTotal
元素,我会为每个元素创建一个新的 csv 行,然后返回树中的先前值。
这是一个使用 lxml 的示例。我添加了一个函数,以便更轻松地处理空值,但是我将留给您任何额外的值格式设置。
Python 3.6
from lxml import etree
import csv
def get_value(target_tree, xpath, namespaces):
try:
return target_tree.xpath(xpath, namespaces=namespaces)[0].text
except IndexError:
return ""
tree = etree.parse("input.xml")
ns = {"cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
"cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
"i2": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"}
with open("output.csv", "w") as csvfile:
csvwriter = csv.writer(csvfile, delimiter=";", lineterminator="\n", quoting=csv.QUOTE_MINIMAL)
# Header
csvwriter.writerow(["ID", "/InvoiceLine/ID", "/InvoiceLine/InvoicedQuantity", "/InvoiceLine/LineExtensionAmount",
"/InvoiceLine/TaxTotal/TaxAmount", "/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name"])
for tax_total in tree.xpath("//cac:TaxTotal", namespaces=ns):
csvwriter.writerow([get_value(tax_total, "/i2:Invoice/cbc:ID", ns),
get_value(tax_total, "../cbc:ID", ns),
get_value(tax_total, "../cbc:InvoicedQuantity", ns),
get_value(tax_total, "../cbc:LineExtensionAmount", ns),
get_value(tax_total, "cbc:TaxAmount", ns),
get_value(tax_total, "cac:TaxSubtotal/cbc:TaxableAmount", ns),
get_value(tax_total, "cac:TaxSubtotal/cbc:TaxAmount", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:ID", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:Percent", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:ID", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:Name", ns)])
输出 (output.csv)
ID;/InvoiceLine/ID;/InvoiceLine/InvoicedQuantity;/InvoiceLine/LineExtensionAmount;/InvoiceLine/TaxTotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name
102165444;1.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;35.60;142.39;35.60;StandardRated;25;63;Moms
这是我的 xml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:sdt="urn:oasis:names:specification:ubl:schema:xsd:SpecializedDatatypes-2" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 UBL-Invoice-2.0.xsd">
<cbc:ID>102165444</cbc:ID>
<cac:InvoiceLine>
<cbc:ID>1.0000</cbc:ID>
<cbc:Note />
<cbc:InvoicedQuantity unitCode="CT">1.0000</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="DKK">142.3900</cbc:LineExtensionAmount>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="DKK">138.24</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="DKK">7.20</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">3645</cbc:ID>
<cac:TaxScheme>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">140</cbc:ID>
<cbc:Name>Afgift</cbc:Name>
<cbc:TaxTypeCode listAgencyID="320" listID="urn:oioubl:codelist:taxtypecode-1.1">StandardRated</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
</cac:InvoiceLine>
<cbc:ID>2.0000</cbc:ID>
<cbc:Note />
<cbc:InvoicedQuantity unitCode="CT">1.0000</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="DKK">142.3900</cbc:LineExtensionAmount>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="DKK">138.24</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="DKK">7.20</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">3645</cbc:ID>
<cac:TaxScheme>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">140</cbc:ID>
<cbc:Name>Afgift</cbc:Name>
<cbc:TaxTypeCode listAgencyID="320" listID="urn:oioubl:codelist:taxtypecode-1.1">StandardRated</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="DKK">35.60</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="DKK">35.60</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">StandardRated</cbc:ID>
<cbc:Percent>25</cbc:Percent>
<cac:TaxScheme>
<cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">63</cbc:ID>
<cbc:Name>Moms</cbc:Name>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
</cac:InvoiceLine>
</Invoice>
如您所见,该文件有一个 id,几个 'Invoice lines',每个都有自己的 id 以及其他子元素。
我想要做的是创建一个 csv 文件,每个发票行都有一行,其中包含来自特定嵌套元素的信息。挑战在于每一行都可以有几个 'TaxTotal' 子元素。在那种情况下,我想要另一行包含这样的信息:
ID;/InvoiceLine/ID;InvoiceLine/InvoicedQuantity;/InvoiceLine/LineExtensionAmount;/InvoiceLine/TaxTotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount /InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount ;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name
102165444;1;1;142,39;138,24;142,39;7,20;3645,00;;140;Afgift
102165444;2;1;142,39;138,24;142,39;7,20;3646,00;;140;Afgift
102165444;2;1;142,39;35,60;142,39;35,60;StandardRated;25,00;63;Moms
我该如何完成?
因为总会有至少一个 TaxTotal
元素,我会为每个元素创建一个新的 csv 行,然后返回树中的先前值。
这是一个使用 lxml 的示例。我添加了一个函数,以便更轻松地处理空值,但是我将留给您任何额外的值格式设置。
Python 3.6
from lxml import etree
import csv
def get_value(target_tree, xpath, namespaces):
try:
return target_tree.xpath(xpath, namespaces=namespaces)[0].text
except IndexError:
return ""
tree = etree.parse("input.xml")
ns = {"cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
"cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
"i2": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"}
with open("output.csv", "w") as csvfile:
csvwriter = csv.writer(csvfile, delimiter=";", lineterminator="\n", quoting=csv.QUOTE_MINIMAL)
# Header
csvwriter.writerow(["ID", "/InvoiceLine/ID", "/InvoiceLine/InvoicedQuantity", "/InvoiceLine/LineExtensionAmount",
"/InvoiceLine/TaxTotal/TaxAmount", "/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID",
"/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name"])
for tax_total in tree.xpath("//cac:TaxTotal", namespaces=ns):
csvwriter.writerow([get_value(tax_total, "/i2:Invoice/cbc:ID", ns),
get_value(tax_total, "../cbc:ID", ns),
get_value(tax_total, "../cbc:InvoicedQuantity", ns),
get_value(tax_total, "../cbc:LineExtensionAmount", ns),
get_value(tax_total, "cbc:TaxAmount", ns),
get_value(tax_total, "cac:TaxSubtotal/cbc:TaxableAmount", ns),
get_value(tax_total, "cac:TaxSubtotal/cbc:TaxAmount", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:ID", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:Percent", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:ID", ns),
get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:Name", ns)])
输出 (output.csv)
ID;/InvoiceLine/ID;/InvoiceLine/InvoicedQuantity;/InvoiceLine/LineExtensionAmount;/InvoiceLine/TaxTotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name
102165444;1.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;35.60;142.39;35.60;StandardRated;25;63;Moms