使用 python 在嵌套的 XML 子元素中添加整数

add integers in nested XML child elements using python

我收到一个包含许多子元素的 XML 文档,我需要从中提取信息,然后导出到 CSV 或文本文档,以便导入到 Quickbooks。 XML 树如下所示:

<MODocuments>
  <MODocument>
    <Document>TX1126348</Document>
    <DocStatus>P</DocStatus>
    <DateIssued>20180510</DateIssued>
    <ApplicantName>COMPANY FRUIT &amp; VEGETABLE</ApplicantName>
    <MOLots>
      <MOLot>
        <LotID>A</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>15500</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>B</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>175</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>C</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>7500</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>D</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>300</TotalPounds>
      </MOLot>
    </MOLots>
  </MODocument>
  <MODocument>
    <Document>TX1126349</Document>
    <DocStatus>P</DocStatus>
    <DateIssued>20180511</DateIssued>
    <ApplicantName>COMPANY FRUIT &amp; VEGETABLE</ApplicantName>
    <MOLots>
      <MOLot>
        <LotID>A</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>25200</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>B</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>16800</TotalPounds>
      </MOLot>
    </MOLots>
  </MODocument>
  <MODocument>
    <Document>TX1126350</Document>
    <DateIssued>20180511</DateIssued>
    <ApplicantName>COMPANY FRUIT &amp; VEGETABLE</ApplicantName>
    <MOLots>
      <MOLot>
        <LotID>A</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>14100</TotalPounds>
      </MOLot>
    </MOLots>
  </MODocument>
</MODocuments>

我需要从每个 MODocument 父级中提取 TotalPounds,这样输出将如下所示: 一份文件中的所有 MOLots 的文件编号、申请人姓名和总磅数相加。

TX1126348   COMPANY FRUIT & VEGETABLE 23475
TX1126349   COMPANY FRUIT & VEGETABLE 42000
TX1126350   COMPANY FRUIT & VEGETABLE 14100

这是我正在使用的代码:

import xml.etree.ElementTree as ET
tree = ET.parse('TX_959_20180514131311.xml')
root = tree.getroot()

docCert = []
docComp = []
totalPounds=[]

for MODocuments in root:
    for MODocument in MODocuments:
        docCert.append(MODocument.find('Document').text)
        docComp.append(MODocument.find('ApplicantName').text)
        for MOLots in MODocument:
            for MOLot in MOLots:
                totalPounds.append(int(MOLot.find('TotalPounds').text))

for i in range(len(docCert)):
    print(i, docCert[i],' ', docComp[i], totalPounds[i])

这是我的输出,我不知道如何将每个文档的总数相加。请帮忙。

0 TX1126348   COMPANY FRUIT & VEGETABLE 15500
1 TX1126349   COMPANY FRUIT & VEGETABLE 175
2 TX1126350   COMPANY FRUIT & VEGETABLE 7500

看起来 totalPounds 中的项目比 docCertdocComp 中的要多。我认为你需要做这样的事情:

for MODocuments in root:
    for MODocument in MODocuments:
        docCert.append(MODocument.find('Document').text)
        docComp.append(MODocument.find('ApplicantName').text)
        sub_total = 0
        for MOLots in MODocument:
            for MOLot in MOLots:
                sub_total += int(MOLot.find('TotalPounds').text)
        totalPounds.append(sub_total)

如果您可以使用 lxml,则可以让 XPath sum() 函数为您求和所有 TotalPounds。

示例...

from lxml import etree
import csv

tree = etree.parse("TX_959_20180514131311.xml")

with open("output.csv", "w", newline="") as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_MINIMAL)
    for mo_doc in tree.xpath("/MODocuments/MODocument"):
        csvwriter.writerow([mo_doc.xpath("Document")[0].text,
                            mo_doc.xpath("ApplicantName")[0].text,
                            int(mo_doc.xpath("sum(MOLots/MOLot/TotalPounds)"))])

"output.csv"的内容...

TX1126348,COMPANY FRUIT & VEGETABLE,23475
TX1126349,COMPANY FRUIT & VEGETABLE,42000
TX1126350,COMPANY FRUIT & VEGETABLE,14100

此外,您可以通过使用 csv.

编写输出来控制引号、定界符等。