将一行 xml 转换为 csv

convert one line xml into csv

我有 xml 个格式如下所示的文档,但我找不到使用 python 将其转换为 csv 的成功方法。我正在使用 Spyder IDE 并且是一个非常业余的人 python-ista。我设法对其中一个文件使用在线转换器,但其余文件太大而无法上传。 我正在寻找输出为 rowID, PostID, Score, Text.

的列

有人可以帮忙吗?

<?xml version="1.0" encoding="utf-8"?>
<comments>
  <row Id="1" PostId="1" Score="5" Text="Was there something in particular you didn't understand in the wikipedia article? http://en.wikipedia.org/wiki/Spin_%28physics%29" CreationDate="2010-11-02T19:11:07.043" UserId="42" />
  <row Id="2" PostId="3" Score="1" Text="I thought the wikipedia article here was pretty good, but maybe it only makes sense if you have a little quantum mechanics background: http://en.wikipedia.org/wiki/Particle_physics_and_representation_theory Were you able to get anything out of it?" CreationDate="2010-11-02T19:13:34.870" UserId="42" />
  <row Id="3" PostId="3" Score="0" Text="i mostly thought this was a better place for the question than MO." CreationDate="2010-11-02T19:16:09.873" UserId="40" />
  <row Id="6" PostId="4" Score="11" Text="An accurate answer, but if the poster doesn't understand the actual concept of spin (not to mention group theory), this is all but useless." CreationDate="2010-11-02T19:32:15.410" UserId="13" />
  <row Id="7" PostId="2" Score="2" Text="I'm tempted to answer: with much difficulty, in a highly qualitative way, and only by reading a fair-sized book. There are many decent pop-sci books on string theory; I can't remember the names of any I read, but I'm sure someone can recommend one or two." CreationDate="2010-11-02T19:36:53.290" UserId="13" />
  <row Id="8" PostId="8" Score="0" Text="so the fundamental particle is acting on the quantum states?" CreationDate="2010-11-02T19:36:55.263" UserId="40" />

其次,如果某些行没有所有字段或有额外的字段,我如何忽略这些并只填充指定字段的内容?我收到以下错误消息,但不想要额外的 3 列?

  ParserError: Error tokenizing data. C error: Expected 4 fields in line 41, saw 7

以下对我有用:

import os
import xml.etree.ElementTree as ET

xml_file = "c:/temp/test.xml"
csv_file_output = '{}_out.csv'.format(os.path.splitext(xml_file)[0])

tree = ET.parse(xml_file)
xml_root = tree.getroot()

with open(csv_file_output, 'w') as fout:
    fout.write("Id,PostId,Score,Text")
    for row in xml_root.iter("row"):
        id = row.get("Id")
        postId = row.get("PostId")
        score = row.get("Score")
        text = row.get("Text")
        fout.write('\n{0},{1},{2},"{3}"'.format(id, postId, score, text))

这也可以使用 pandas 并将数据帧保存为 CSV 来完成,但我保持简单。

将在与 XML 文件相同的文件夹中生成一个同名但以 _out.csv 结尾的文件。