将多个 CSV 文件转换为单个 XML
Converting multiple CSV files into single XML
我正在尝试使用元素树将多个 csv 文件(目前为两个)转换为 xml,但我没有得到准确的输出。请以更有效的方法指导我。 PS:I我是初学者。
import csv
import xml.etree.ElementTree as ET
#from bs4 import BeautifulSoup
root = ET.Element('Policy')
with open("policy.csv","r") as p, open("Att.csv","r") as a, open("rider.csv","r") as r:
csv_p = csv.reader(p)
header_p = next(csv_p)
csv_a = csv.reader(a)
header_a = next(csv_a)
csv_r = csv.reader(r)
header_r = next(csv_r)
for row in csv_p:
pid = row[0]
print("\n",pid)
for col in range(len(header_p)):
ET.SubElement(root, header_p[col]).text = str(row[col])
for childrow in csv_a:
if(pid == childrow[0]):
print("Match found")
child = ET.SubElement(root,"child")
for col_a in range(len(header_a)):
ET.SubElement(child, header_a[col_a]).text = str(childrow[col_a])
for tailrow in csv_r:
if(childrow[1] == tailrow[0]):
print("tail found",tailrow[0])
tail = ET.SubElement(child,"tail")
for col_r in range(len(header_r)):
ET.SubElement(tail, header_r[col_r]).text = str(tailrow[col_r])
r.seek(0)
a.seek(0)
tree = ET.tostring(root, encoding="UTF-8")
#print(BeautifulSoup(tree, "xml").prettify())
with open("Output.xml", "wb") as f:
f.write(tree)
with open('Output.xml', 'r') as f:
print("\n\n",f.read())
输出如下所示,但您可以看到一些标签被重复,因为它们在我正在阅读的文件中是多余的:
Policy.csv:
Pid,Name,Date
101,Life In,3Jan2017
102,Mobile,8Aug2018
Att.csv:
PId,AId,Name
101,9001,Pune
101,9002,Mumbai
102,9003,Delhi
rider.csv:
AId,RID,Name
9001,10001,Ramesh
9001,10002,Suresh
9002,10003,Rahul
9002,10004,Kirti
输出:
<Policy>
<Pid>101</Pid>
<child>
<PId>101</PId>
<tail><AId>9001</AId>
<RID>10001</RID>
<Name>Ramesh</Name>
</tail>
<tail>
<AId>9001</AId>
<RID>10002</RID>
<Name>Suresh</Name>
</tail>
<AId>9001</AId>
<Name>Pune</Name>
</child>
<child>
<PId>101</PId>
<tail><AId>9002</AId>
<RID>10003</RID>
<Name>Rahul</Name>
</tail>
<tail><AId>9002</AId>
<RID>10004</RID>
<Name>Kirti</Name>
</tail>
<AId>9002</AId>
<Name>Mumbai</Name>
</child>
<Name>Life In</Name>
<Date>3Jan2017</Date>
</Policy>
所需输出实例:
<Policy>
<Pid>101</Pid>
<child>
<AId>9001</AId>
<tail>
<RID>10001</RID>
<Name>Ramesh</Name>
</tail>
<tail>
<RID>10002</RID>
<Name>Suresh</Name>
</tail>
<Name>Pune</Name>
</child>
<Name>Life In</Name>
<Date>3Jan2017</Date>
</Policy>
如果你能使用 lxml,这里有一个我在评论中谈论的例子。
希望我的逻辑正确:
- A
policy
基于 Policy.csv 中的一行。它由 Pid
. 唯一标识
policy
中的 child
基于 Att.csv 中具有匹配 PId
. 的行
child
中的 tail
基于 rider.csv 中具有匹配 AId
. 的行
我要做的第一件事是将 csv 转换为临时 XML 格式。
由于您的 csv 文件的 header 值是有效的元素名称,我将继续并根据这些值创建元素。
如果您的 csv 文件可能包含 header 值,这些值不是有效的元素名称,您可以使用通用元素名称并将 header 值存储在属性中。 (如果需要,我可以更改示例。)
然后我会转换临时 XML 并处理那里的所有分组。由于 lxml 仅支持 XSLT 1.0,我们必须使用 Muenchian Grouping.
示例...
Python
import csv
from os import path
from lxml import etree
def csv2xml(file):
result = etree.Element(path.splitext(file)[0])
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
row_elem = etree.SubElement(result, "row")
for entry in row:
entry_elem = etree.SubElement(row_elem, entry.strip().lower())
entry_elem.text = row.get(entry).strip()
return result
csv_files = ["policy.csv", "att.csv", "rider.csv"]
temp_xml = etree.Element("policies")
for csv_file in csv_files:
xml = csv2xml(csv_file)
temp_xml.append(xml)
xslt = etree.parse("transform.xsl")
xml_output = etree.ElementTree(temp_xml).xslt(xslt)
print(etree.tostring(xml_output, pretty_print=True).decode())
XSLT (transform.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="policy" match="policy/row" use="pid"/>
<xsl:key name="att" match="att/row" use="pid"/>
<xsl:key name="rider" match="rider/row" use="aid"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="policy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="policy">
<xsl:for-each select="row[count(.|key('policy', pid)[1])=1]">
<policy>
<xsl:apply-templates select="pid"/>
<xsl:apply-templates select="key('att', pid)"/>
<xsl:apply-templates select="name|date"/>
</policy>
</xsl:for-each>
</xsl:template>
<xsl:template match="att/row">
<child>
<xsl:apply-templates select="aid"/>
<xsl:apply-templates select="key('rider', aid)"/>
<xsl:apply-templates select="name"/>
</child>
</xsl:template>
<xsl:template match="rider/row">
<tail>
<xsl:apply-templates select="rid|name"/>
</tail>
</xsl:template>
</xsl:stylesheet>
Python 将打印此输出:
<policies>
<policy>
<pid>101</pid>
<child>
<aid>9001</aid>
<tail>
<rid>10001</rid>
<name>Ramesh</name>
</tail>
<tail>
<rid>10002</rid>
<name>Suresh</name>
</tail>
<name>Pune</name>
</child>
<child>
<aid>9002</aid>
<tail>
<rid>10003</rid>
<name>Rahul</name>
</tail>
<tail>
<rid>10004</rid>
<name>Kirti</name>
</tail>
<name>Mumbai</name>
</child>
<name>Life In</name>
<date>3Jan2017</date>
</policy>
<policy>
<pid>102</pid>
<child>
<aid>9003</aid>
<name>Delhi</name>
</child>
<name>Mobile</name>
<date>8Aug2018</date>
</policy>
</policies>
希望这对您有所帮助。
我正在尝试使用元素树将多个 csv 文件(目前为两个)转换为 xml,但我没有得到准确的输出。请以更有效的方法指导我。 PS:I我是初学者。
import csv
import xml.etree.ElementTree as ET
#from bs4 import BeautifulSoup
root = ET.Element('Policy')
with open("policy.csv","r") as p, open("Att.csv","r") as a, open("rider.csv","r") as r:
csv_p = csv.reader(p)
header_p = next(csv_p)
csv_a = csv.reader(a)
header_a = next(csv_a)
csv_r = csv.reader(r)
header_r = next(csv_r)
for row in csv_p:
pid = row[0]
print("\n",pid)
for col in range(len(header_p)):
ET.SubElement(root, header_p[col]).text = str(row[col])
for childrow in csv_a:
if(pid == childrow[0]):
print("Match found")
child = ET.SubElement(root,"child")
for col_a in range(len(header_a)):
ET.SubElement(child, header_a[col_a]).text = str(childrow[col_a])
for tailrow in csv_r:
if(childrow[1] == tailrow[0]):
print("tail found",tailrow[0])
tail = ET.SubElement(child,"tail")
for col_r in range(len(header_r)):
ET.SubElement(tail, header_r[col_r]).text = str(tailrow[col_r])
r.seek(0)
a.seek(0)
tree = ET.tostring(root, encoding="UTF-8")
#print(BeautifulSoup(tree, "xml").prettify())
with open("Output.xml", "wb") as f:
f.write(tree)
with open('Output.xml', 'r') as f:
print("\n\n",f.read())
输出如下所示,但您可以看到一些标签被重复,因为它们在我正在阅读的文件中是多余的:
Policy.csv:
Pid,Name,Date
101,Life In,3Jan2017
102,Mobile,8Aug2018
Att.csv:
PId,AId,Name
101,9001,Pune
101,9002,Mumbai
102,9003,Delhi
rider.csv:
AId,RID,Name
9001,10001,Ramesh
9001,10002,Suresh
9002,10003,Rahul
9002,10004,Kirti
输出:
<Policy>
<Pid>101</Pid>
<child>
<PId>101</PId>
<tail><AId>9001</AId>
<RID>10001</RID>
<Name>Ramesh</Name>
</tail>
<tail>
<AId>9001</AId>
<RID>10002</RID>
<Name>Suresh</Name>
</tail>
<AId>9001</AId>
<Name>Pune</Name>
</child>
<child>
<PId>101</PId>
<tail><AId>9002</AId>
<RID>10003</RID>
<Name>Rahul</Name>
</tail>
<tail><AId>9002</AId>
<RID>10004</RID>
<Name>Kirti</Name>
</tail>
<AId>9002</AId>
<Name>Mumbai</Name>
</child>
<Name>Life In</Name>
<Date>3Jan2017</Date>
</Policy>
所需输出实例:
<Policy>
<Pid>101</Pid>
<child>
<AId>9001</AId>
<tail>
<RID>10001</RID>
<Name>Ramesh</Name>
</tail>
<tail>
<RID>10002</RID>
<Name>Suresh</Name>
</tail>
<Name>Pune</Name>
</child>
<Name>Life In</Name>
<Date>3Jan2017</Date>
</Policy>
如果你能使用 lxml,这里有一个我在评论中谈论的例子。
希望我的逻辑正确:
- A
policy
基于 Policy.csv 中的一行。它由Pid
. 唯一标识
policy
中的child
基于 Att.csv 中具有匹配PId
. 的行
child
中的tail
基于 rider.csv 中具有匹配AId
. 的行
我要做的第一件事是将 csv 转换为临时 XML 格式。
由于您的 csv 文件的 header 值是有效的元素名称,我将继续并根据这些值创建元素。
如果您的 csv 文件可能包含 header 值,这些值不是有效的元素名称,您可以使用通用元素名称并将 header 值存储在属性中。 (如果需要,我可以更改示例。)
然后我会转换临时 XML 并处理那里的所有分组。由于 lxml 仅支持 XSLT 1.0,我们必须使用 Muenchian Grouping.
示例...
Python
import csv
from os import path
from lxml import etree
def csv2xml(file):
result = etree.Element(path.splitext(file)[0])
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
row_elem = etree.SubElement(result, "row")
for entry in row:
entry_elem = etree.SubElement(row_elem, entry.strip().lower())
entry_elem.text = row.get(entry).strip()
return result
csv_files = ["policy.csv", "att.csv", "rider.csv"]
temp_xml = etree.Element("policies")
for csv_file in csv_files:
xml = csv2xml(csv_file)
temp_xml.append(xml)
xslt = etree.parse("transform.xsl")
xml_output = etree.ElementTree(temp_xml).xslt(xslt)
print(etree.tostring(xml_output, pretty_print=True).decode())
XSLT (transform.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="policy" match="policy/row" use="pid"/>
<xsl:key name="att" match="att/row" use="pid"/>
<xsl:key name="rider" match="rider/row" use="aid"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="policy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="policy">
<xsl:for-each select="row[count(.|key('policy', pid)[1])=1]">
<policy>
<xsl:apply-templates select="pid"/>
<xsl:apply-templates select="key('att', pid)"/>
<xsl:apply-templates select="name|date"/>
</policy>
</xsl:for-each>
</xsl:template>
<xsl:template match="att/row">
<child>
<xsl:apply-templates select="aid"/>
<xsl:apply-templates select="key('rider', aid)"/>
<xsl:apply-templates select="name"/>
</child>
</xsl:template>
<xsl:template match="rider/row">
<tail>
<xsl:apply-templates select="rid|name"/>
</tail>
</xsl:template>
</xsl:stylesheet>
Python 将打印此输出:
<policies>
<policy>
<pid>101</pid>
<child>
<aid>9001</aid>
<tail>
<rid>10001</rid>
<name>Ramesh</name>
</tail>
<tail>
<rid>10002</rid>
<name>Suresh</name>
</tail>
<name>Pune</name>
</child>
<child>
<aid>9002</aid>
<tail>
<rid>10003</rid>
<name>Rahul</name>
</tail>
<tail>
<rid>10004</rid>
<name>Kirti</name>
</tail>
<name>Mumbai</name>
</child>
<name>Life In</name>
<date>3Jan2017</date>
</policy>
<policy>
<pid>102</pid>
<child>
<aid>9003</aid>
<name>Delhi</name>
</child>
<name>Mobile</name>
<date>8Aug2018</date>
</policy>
</policies>
希望这对您有所帮助。