将多个 CSV 文件转换为单个 XML

Converting multiple CSV files into single XML

我正在尝试使用元素树将多个 csv 文件(目前为两个)转换为 xml,但我没有得到准确的输出。请以更有效的方法指导我。 PS:I我是初学者。

import csv
import xml.etree.ElementTree as ET
#from bs4 import BeautifulSoup

root = ET.Element('Policy')

with open("policy.csv","r") as p, open("Att.csv","r") as a, open("rider.csv","r") as r:
  csv_p = csv.reader(p)
  header_p = next(csv_p)
  csv_a = csv.reader(a)
  header_a = next(csv_a) 
  csv_r = csv.reader(r)
  header_r = next(csv_r)
  for row in csv_p:
    pid = row[0]
    print("\n",pid)
    for col in range(len(header_p)):
      ET.SubElement(root, header_p[col]).text = str(row[col])
      for childrow in csv_a:
        if(pid == childrow[0]):
          print("Match found")
          child = ET.SubElement(root,"child")
          for col_a in range(len(header_a)):
            ET.SubElement(child, header_a[col_a]).text = str(childrow[col_a])
            for tailrow in csv_r:
              if(childrow[1] == tailrow[0]):
                print("tail found",tailrow[0])
                tail = ET.SubElement(child,"tail")
                for col_r in range(len(header_r)):
                  ET.SubElement(tail, header_r[col_r]).text = str(tailrow[col_r])  
          r.seek(0)
    a.seek(0)

tree = ET.tostring(root, encoding="UTF-8")
#print(BeautifulSoup(tree, "xml").prettify())

with open("Output.xml", "wb") as f:
    f.write(tree)

with open('Output.xml', 'r') as f:
    print("\n\n",f.read())

输出如下所示,但您可以看到一些标签被重复,因为它们在我正在阅读的文件中是多余的:

Policy.csv:

Pid,Name,Date 
101,Life In,3Jan2017
102,Mobile,8Aug2018 

Att.csv:

PId,AId,Name  
101,9001,Pune
101,9002,Mumbai  
102,9003,Delhi

rider.csv:

AId,RID,Name
9001,10001,Ramesh 
9001,10002,Suresh 
9002,10003,Rahul 
9002,10004,Kirti

输出:

<Policy>
    <Pid>101</Pid>
        <child>
            <PId>101</PId>
                <tail><AId>9001</AId>
                        <RID>10001</RID>
                        <Name>Ramesh</Name>
                </tail>
                <tail>
                    <AId>9001</AId>
                    <RID>10002</RID>
                    <Name>Suresh</Name>
                </tail>
                <AId>9001</AId>
                <Name>Pune</Name>
        </child>
        <child>
            <PId>101</PId>
                <tail><AId>9002</AId>
                    <RID>10003</RID>
                    <Name>Rahul</Name>
                </tail>
                <tail><AId>9002</AId>
                    <RID>10004</RID>
                    <Name>Kirti</Name>
                </tail>
                    <AId>9002</AId>
                    <Name>Mumbai</Name>
        </child>
        <Name>Life In</Name>
        <Date>3Jan2017</Date>
</Policy>

所需输出实例:

<Policy>
    <Pid>101</Pid>
    <child>
      <AId>9001</AId>
        <tail>
          <RID>10001</RID>
          <Name>Ramesh</Name>
        </tail>
        <tail>                    
          <RID>10002</RID>
          <Name>Suresh</Name>
        </tail>          
      <Name>Pune</Name>
    </child>
    <Name>Life In</Name>
  <Date>3Jan2017</Date>
</Policy>

如果你能使用 lxml,这里有一个我在评论中谈论的例子。

希望我的逻辑正确:

  • A policy 基于 Policy.csv 中的一行。它由 Pid.
  • 唯一标识
  • policy 中的 child 基于 Att.csv 中具有匹配 PId.
  • 的行
  • child 中的 tail 基于 rider.csv 中具有匹配 AId.
  • 的行

我要做的第一件事是将 csv 转换为临时 XML 格式。

由于您的 csv 文件的 header 值是有效的元素名称,我将继续并根据这些值创建元素。

如果您的 csv 文件可能包含 header 值,这些值不是有效的元素名称,您可以使用通用元素名称并将 header 值存储在属性中。 (如果需要,我可以更改示例。)

然后我会转换临时 XML 并处理那里的所有分组。由于 lxml 仅支持 XSLT 1.0,我们必须使用 Muenchian Grouping.

示例...

Python

import csv
from os import path
from lxml import etree


def csv2xml(file):
    result = etree.Element(path.splitext(file)[0])
    with open(file) as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            row_elem = etree.SubElement(result, "row")
            for entry in row:
                entry_elem = etree.SubElement(row_elem, entry.strip().lower())
                entry_elem.text = row.get(entry).strip()
    return result


csv_files = ["policy.csv", "att.csv", "rider.csv"]

temp_xml = etree.Element("policies")

for csv_file in csv_files:
    xml = csv2xml(csv_file)
    temp_xml.append(xml)

xslt = etree.parse("transform.xsl")

xml_output = etree.ElementTree(temp_xml).xslt(xslt)

print(etree.tostring(xml_output, pretty_print=True).decode())

XSLT (transform.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="policy" match="policy/row" use="pid"/>
  <xsl:key name="att" match="att/row" use="pid"/>
  <xsl:key name="rider" match="rider/row" use="aid"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/*">
    <xsl:copy>
      <xsl:apply-templates select="policy"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="policy">
      <xsl:for-each select="row[count(.|key('policy', pid)[1])=1]">
        <policy>
          <xsl:apply-templates select="pid"/>
          <xsl:apply-templates select="key('att', pid)"/>
          <xsl:apply-templates select="name|date"/>
        </policy>
      </xsl:for-each>
  </xsl:template>

  <xsl:template match="att/row">
    <child>
      <xsl:apply-templates select="aid"/>
      <xsl:apply-templates select="key('rider', aid)"/>
      <xsl:apply-templates select="name"/>
    </child>
  </xsl:template>

  <xsl:template match="rider/row">
    <tail>
      <xsl:apply-templates select="rid|name"/>
    </tail>
  </xsl:template>

</xsl:stylesheet>

Python 将打印此输出:

<policies>
  <policy>
    <pid>101</pid>
    <child>
      <aid>9001</aid>
      <tail>
        <rid>10001</rid>
        <name>Ramesh</name>
      </tail>
      <tail>
        <rid>10002</rid>
        <name>Suresh</name>
      </tail>
      <name>Pune</name>
    </child>
    <child>
      <aid>9002</aid>
      <tail>
        <rid>10003</rid>
        <name>Rahul</name>
      </tail>
      <tail>
        <rid>10004</rid>
        <name>Kirti</name>
      </tail>
      <name>Mumbai</name>
    </child>
    <name>Life In</name>
    <date>3Jan2017</date>
  </policy>
  <policy>
    <pid>102</pid>
    <child>
      <aid>9003</aid>
      <name>Delhi</name>
    </child>
    <name>Mobile</name>
    <date>8Aug2018</date>
  </policy>
</policies>

希望这对您有所帮助。