使用 python 在嵌套的 XML 子元素中添加整数
add integers in nested XML child elements using python
我收到一个包含许多子元素的 XML 文档,我需要从中提取信息,然后导出到 CSV 或文本文档,以便导入到 Quickbooks。 XML 树如下所示:
<MODocuments>
<MODocument>
<Document>TX1126348</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180510</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>15500</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>175</TotalPounds>
</MOLot>
<MOLot>
<LotID>C</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>7500</TotalPounds>
</MOLot>
<MOLot>
<LotID>D</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>300</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126349</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>25200</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>16800</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126350</Document>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>14100</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
</MODocuments>
我需要从每个 MODocument 父级中提取 TotalPounds,这样输出将如下所示:
一份文件中的所有 MOLots 的文件编号、申请人姓名和总磅数相加。
TX1126348 COMPANY FRUIT & VEGETABLE 23475
TX1126349 COMPANY FRUIT & VEGETABLE 42000
TX1126350 COMPANY FRUIT & VEGETABLE 14100
这是我正在使用的代码:
import xml.etree.ElementTree as ET
tree = ET.parse('TX_959_20180514131311.xml')
root = tree.getroot()
docCert = []
docComp = []
totalPounds=[]
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
for MOLots in MODocument:
for MOLot in MOLots:
totalPounds.append(int(MOLot.find('TotalPounds').text))
for i in range(len(docCert)):
print(i, docCert[i],' ', docComp[i], totalPounds[i])
这是我的输出,我不知道如何将每个文档的总数相加。请帮忙。
0 TX1126348 COMPANY FRUIT & VEGETABLE 15500
1 TX1126349 COMPANY FRUIT & VEGETABLE 175
2 TX1126350 COMPANY FRUIT & VEGETABLE 7500
看起来 totalPounds
中的项目比 docCert
或 docComp
中的要多。我认为你需要做这样的事情:
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
sub_total = 0
for MOLots in MODocument:
for MOLot in MOLots:
sub_total += int(MOLot.find('TotalPounds').text)
totalPounds.append(sub_total)
如果您可以使用 lxml,则可以让 XPath sum()
函数为您求和所有 TotalPounds。
示例...
from lxml import etree
import csv
tree = etree.parse("TX_959_20180514131311.xml")
with open("output.csv", "w", newline="") as csvfile:
csvwriter = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_MINIMAL)
for mo_doc in tree.xpath("/MODocuments/MODocument"):
csvwriter.writerow([mo_doc.xpath("Document")[0].text,
mo_doc.xpath("ApplicantName")[0].text,
int(mo_doc.xpath("sum(MOLots/MOLot/TotalPounds)"))])
"output.csv"的内容...
TX1126348,COMPANY FRUIT & VEGETABLE,23475
TX1126349,COMPANY FRUIT & VEGETABLE,42000
TX1126350,COMPANY FRUIT & VEGETABLE,14100
此外,您可以通过使用 csv
.
编写输出来控制引号、定界符等。
我收到一个包含许多子元素的 XML 文档,我需要从中提取信息,然后导出到 CSV 或文本文档,以便导入到 Quickbooks。 XML 树如下所示:
<MODocuments>
<MODocument>
<Document>TX1126348</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180510</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>15500</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>175</TotalPounds>
</MOLot>
<MOLot>
<LotID>C</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>7500</TotalPounds>
</MOLot>
<MOLot>
<LotID>D</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>300</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126349</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>25200</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>16800</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126350</Document>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>14100</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
</MODocuments>
我需要从每个 MODocument 父级中提取 TotalPounds,这样输出将如下所示: 一份文件中的所有 MOLots 的文件编号、申请人姓名和总磅数相加。
TX1126348 COMPANY FRUIT & VEGETABLE 23475
TX1126349 COMPANY FRUIT & VEGETABLE 42000
TX1126350 COMPANY FRUIT & VEGETABLE 14100
这是我正在使用的代码:
import xml.etree.ElementTree as ET
tree = ET.parse('TX_959_20180514131311.xml')
root = tree.getroot()
docCert = []
docComp = []
totalPounds=[]
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
for MOLots in MODocument:
for MOLot in MOLots:
totalPounds.append(int(MOLot.find('TotalPounds').text))
for i in range(len(docCert)):
print(i, docCert[i],' ', docComp[i], totalPounds[i])
这是我的输出,我不知道如何将每个文档的总数相加。请帮忙。
0 TX1126348 COMPANY FRUIT & VEGETABLE 15500
1 TX1126349 COMPANY FRUIT & VEGETABLE 175
2 TX1126350 COMPANY FRUIT & VEGETABLE 7500
看起来 totalPounds
中的项目比 docCert
或 docComp
中的要多。我认为你需要做这样的事情:
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
sub_total = 0
for MOLots in MODocument:
for MOLot in MOLots:
sub_total += int(MOLot.find('TotalPounds').text)
totalPounds.append(sub_total)
如果您可以使用 lxml,则可以让 XPath sum()
函数为您求和所有 TotalPounds。
示例...
from lxml import etree
import csv
tree = etree.parse("TX_959_20180514131311.xml")
with open("output.csv", "w", newline="") as csvfile:
csvwriter = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_MINIMAL)
for mo_doc in tree.xpath("/MODocuments/MODocument"):
csvwriter.writerow([mo_doc.xpath("Document")[0].text,
mo_doc.xpath("ApplicantName")[0].text,
int(mo_doc.xpath("sum(MOLots/MOLot/TotalPounds)"))])
"output.csv"的内容...
TX1126348,COMPANY FRUIT & VEGETABLE,23475
TX1126349,COMPANY FRUIT & VEGETABLE,42000
TX1126350,COMPANY FRUIT & VEGETABLE,14100
此外,您可以通过使用 csv
.