在 Python 中创建嵌套 XML 文档
Creating a Nested XML document in Python
偶尔的脚本编写者,我搜索了这个论坛,到目前为止我已经花了很多时间,但我一直在寻求帮助。我正在尝试从 CSV 结构创建一个 XML 文档,目的是让一些东西看起来像这样:
ID,Type,Currency,Notional,Underlying,Maturity Date,Representation Type
ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,COMMIT,AUD,110,,2018-03-25,Stock
并将其转换为如下所示。
<tradeRequests>
<tradeRequest>
<id>ID1</id>
<newDeals size="1">
<deal>
<id>ID1</id>
<terms>
<id>ID1</id>
<MaturityDate>2018-06-01</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
<tradeRequest>
<id>ID2</id>
<newDeals size="1">
<deal>
<id>ID2</id>
<terms>
<id>ID2</id>
<MaturityDate>2018-06-01</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
</tradeRequests>
问题是我的脚本似乎没有以正确的方式格式化项目,因为每一行本质上应该是一个 tradeRequest,但我没有看到那种格式。
这是我的代码片段,它将从大量列中提取列的子集。
import csv
import xml.etree.ElementTree as ET
import xml.dom.minidom
tradeRequests = ET.Element("tradeRequests")
tradeRequest = ET.SubElement(tradeRequests, "tradeRequest")
newDeals = ET.SubElement(tradeRequest, "newDeals")
deal = ET.SubElement(newDeals, "deal")
dealid = ET.SubElement(deal, "id")
with open('TestCase.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
ET.SubElement(tradeRequest, "id").text = row['ID']
ET.SubElement(tradeRequest, "newDeals", {'size':"1"} )
ET.SubElement(dealid, "id").text = row['ID']
ET.SubElement(dealid, "maturityDate").text = row['Maturity Date']
tree = ET.ElementTree(tradeRequests)
tree.write("Testcase.xml" )
xml = xml.dom.minidom.parse('Testcase.xml')
pretty_xml_as_string = xml.toprettyxml()
print pretty_xml_as_string
问题是我似乎无法正确嵌套项目。我试过创建 parent/child 组合,但没有成功。相反,基于该代码,我看到了如下所示的输出。
<tradeRequests>
<tradeRequest>
<newDeals>
<deal>
<id>
<id>ID1</id>
<maturityDate>2018-06-01</maturityDate>
<id>ID2</id>
<maturityDate>2018-03-25</maturityDate>
</id>
</deal>
</newDeals>
<id>ID1</id>
<newDeals size="1"/>
<id>ID2</id>
<newDeals size="1"/>
</tradeRequest>
</tradeRequests>
一如既往地感谢您的帮助。
我没想到这个用户案例需要循环并动态创建元素
ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,110,2018-03-25,Stock
ID2,110,2018-03-26,A
ID2,110,2018-03-26,B
ID2,110,2018-03-26,C
所以实际上我需要创建一个循环遍历 ID2 的元素,并根据未知的行数动态创建一个新元素。
所以我的预期结果会是这样的
<tradeRequests>
<ids>
<id>ID1</id>
<element>
<maturityDate>2018-06-01</maturityDate>
<type>Stock</type
<element>
</id>
<id>ID2</id>
<element>
<maturityDate>2018-03-25</maturityDate>
<type>A</type>
</element>
<element>
<maturityDate>2018-03-25</maturityDate>
<type>B</type>
</element>
<maturityDate>2018-03-25</maturityDate>
<type>C</type>
</element>
</id>
</tradeRequests>
我强烈建议使用优秀的 lxml
library. 它真的很快,因为它是一个基于 C 库 libxml2 的包装器,它包含元素构建器对象 E
,这使您的工作真正容易:
import csv
import lxml.etree
from lxml.builder import E
with open('TestCase.csv') as csvfile:
results = E.tradeRequests(*(
E.tradeRequest(
E.id(row['ID']),
E.newDeals(
E.deal(
E.id(row['ID']),
E.terms(
E.id(row['ID']),
E.MaturityDate(row['Maturity Date']),
)
),
size="1",
)
) for row in csv.DictReader(csvfile))
)
print(lxml.etree.tostring(results, pretty_print=True))
结果:
<tradeRequests>
<tradeRequest>
<id>ID1</id>
<newDeals size="1">
<deal>
<id>ID1</id>
<terms>
<id>ID1</id>
<MaturityDate>2018-06-01</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
<tradeRequest>
<id>ID2</id>
<newDeals size="1">
<deal>
<id>ID2</id>
<terms>
<id>ID2</id>
<MaturityDate>2018-03-25</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
</tradeRequests>
偶尔的脚本编写者,我搜索了这个论坛,到目前为止我已经花了很多时间,但我一直在寻求帮助。我正在尝试从 CSV 结构创建一个 XML 文档,目的是让一些东西看起来像这样:
ID,Type,Currency,Notional,Underlying,Maturity Date,Representation Type
ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,COMMIT,AUD,110,,2018-03-25,Stock
并将其转换为如下所示。
<tradeRequests>
<tradeRequest>
<id>ID1</id>
<newDeals size="1">
<deal>
<id>ID1</id>
<terms>
<id>ID1</id>
<MaturityDate>2018-06-01</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
<tradeRequest>
<id>ID2</id>
<newDeals size="1">
<deal>
<id>ID2</id>
<terms>
<id>ID2</id>
<MaturityDate>2018-06-01</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
</tradeRequests>
问题是我的脚本似乎没有以正确的方式格式化项目,因为每一行本质上应该是一个 tradeRequest,但我没有看到那种格式。
这是我的代码片段,它将从大量列中提取列的子集。
import csv
import xml.etree.ElementTree as ET
import xml.dom.minidom
tradeRequests = ET.Element("tradeRequests")
tradeRequest = ET.SubElement(tradeRequests, "tradeRequest")
newDeals = ET.SubElement(tradeRequest, "newDeals")
deal = ET.SubElement(newDeals, "deal")
dealid = ET.SubElement(deal, "id")
with open('TestCase.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
ET.SubElement(tradeRequest, "id").text = row['ID']
ET.SubElement(tradeRequest, "newDeals", {'size':"1"} )
ET.SubElement(dealid, "id").text = row['ID']
ET.SubElement(dealid, "maturityDate").text = row['Maturity Date']
tree = ET.ElementTree(tradeRequests)
tree.write("Testcase.xml" )
xml = xml.dom.minidom.parse('Testcase.xml')
pretty_xml_as_string = xml.toprettyxml()
print pretty_xml_as_string
问题是我似乎无法正确嵌套项目。我试过创建 parent/child 组合,但没有成功。相反,基于该代码,我看到了如下所示的输出。
<tradeRequests>
<tradeRequest>
<newDeals>
<deal>
<id>
<id>ID1</id>
<maturityDate>2018-06-01</maturityDate>
<id>ID2</id>
<maturityDate>2018-03-25</maturityDate>
</id>
</deal>
</newDeals>
<id>ID1</id>
<newDeals size="1"/>
<id>ID2</id>
<newDeals size="1"/>
</tradeRequest>
</tradeRequests>
一如既往地感谢您的帮助。
我没想到这个用户案例需要循环并动态创建元素
ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,110,2018-03-25,Stock
ID2,110,2018-03-26,A
ID2,110,2018-03-26,B
ID2,110,2018-03-26,C
所以实际上我需要创建一个循环遍历 ID2 的元素,并根据未知的行数动态创建一个新元素。
所以我的预期结果会是这样的
<tradeRequests>
<ids>
<id>ID1</id>
<element>
<maturityDate>2018-06-01</maturityDate>
<type>Stock</type
<element>
</id>
<id>ID2</id>
<element>
<maturityDate>2018-03-25</maturityDate>
<type>A</type>
</element>
<element>
<maturityDate>2018-03-25</maturityDate>
<type>B</type>
</element>
<maturityDate>2018-03-25</maturityDate>
<type>C</type>
</element>
</id>
</tradeRequests>
我强烈建议使用优秀的 lxml
library. 它真的很快,因为它是一个基于 C 库 libxml2 的包装器,它包含元素构建器对象 E
,这使您的工作真正容易:
import csv
import lxml.etree
from lxml.builder import E
with open('TestCase.csv') as csvfile:
results = E.tradeRequests(*(
E.tradeRequest(
E.id(row['ID']),
E.newDeals(
E.deal(
E.id(row['ID']),
E.terms(
E.id(row['ID']),
E.MaturityDate(row['Maturity Date']),
)
),
size="1",
)
) for row in csv.DictReader(csvfile))
)
print(lxml.etree.tostring(results, pretty_print=True))
结果:
<tradeRequests>
<tradeRequest>
<id>ID1</id>
<newDeals size="1">
<deal>
<id>ID1</id>
<terms>
<id>ID1</id>
<MaturityDate>2018-06-01</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
<tradeRequest>
<id>ID2</id>
<newDeals size="1">
<deal>
<id>ID2</id>
<terms>
<id>ID2</id>
<MaturityDate>2018-03-25</MaturityDate>
</terms>
</deal>
</newDeals>
</tradeRequest>
</tradeRequests>