将 XML 解析为 CSV 的多个 for 循环不起作用
Multiple for loops to parse XML to CSV not working
我想编写一个可用于不同 XML 文件(全部采用 TEI 编码)的代码,以查看特定元素和属性是否出现、出现的频率以及出现的上下文)。为此,我编写了以下代码:
from logging import root
import xml.etree.ElementTree as ET
import csv
f = open('orestes-elements.csv', 'w', encoding="utf-8")
writer = csv.writer(f)
writer.writerow(["Note Attributes", "Note Text", "Responsibility", "Certainty Element", "Certainty Attributes", "Certainty Text"])
tree = ET.parse(r"C:\Users\noahb\OneDrive\Desktop\Humboldt\Semester 2\Daten\Hausarbeit-TEI\edition-euripides\Orestes.xml")
root = tree.getroot()
try:
for note in root.findall('.//note'):
noteat = note.attrib
notetext = note.text
print(noteat)
print(notetext)
#attribute search
for responsibility in root.findall(".//*[@resp]"):
responsibilities = str(responsibility.tag, responsibility.attrib, responsibility.text)
for certainty in root.findall(".//*[@cert]"):
certaintytag = certainty.tag
certaintyat = certainty.attrib
certaintytext = certainty.text
writer.writerow([noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext])
finally:
f.close()
我收到错误“NameError:名称 'noteat' 未定义”。我可以缩进 writer.writerrow 但是来自另一个 for 循环的信息没有被添加。如何从不同的 for 循环中获取信息到我的 CSV 文件中?帮助将不胜感激? (for 循环中的 print() 给了我正确的结果和责任,我试着把它全部变成一个字符串,但这不是必需的,我只是在尝试不同的解决方案 - none 工作到现在)。
这是我的 XML 文件的示例:(某些元素和属性不会出现在某些文件中 - 这可能是错误的原因吗?)
<?xml version="1.0" encoding="UTF-8"?>
<!--<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="grc">-->
<?oxygen RNGSchema="teiScholiaSchema2021beta.rng" type="xml"?>
<TEI xml:lang="grc">
<teiHeader>
<titleStmt>
<title cert="high">Scholia on Euripides’ Orestes 1–500</title>
<author><note>Donald J.</note> Mastronarde</author>
</titleStmt>
</teiHeader>
<text>
<div1 type="subdivisionByPlay" xml:id="Orestes">
<div2 type="hypotheseis" xml:id="hypOrestes">
<head type="outer" xml:lang="en">Prefatory material (argumenta/hypotheseis) for Orestes</head>
<p>Orestes, pursuing <note cert="low">(vengeance for)</note> the murder of his father, killed Aegisthus and
Clytemnestra. Having dared to commit matricide he paid the penalty immediately, becoming
mad. And after Tyndareus, the father of the murdered woman, brought an accusation, the
Argives were about to issue a public vote about him, concerning what the man who had acted
impiously should suffer.
</p>
</div2>
</div1>
</text>
</TEI>
CSV 格式示例:
如果缺少某个元素,您的 writer.writerow()
中的值将不会被定义。您可以定义一些默认值来避免这种情况。
尝试在 try
语句后添加以下内容:
noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext = [''] * 6
如果愿意,您当然可以 'NA'
。
我想编写一个可用于不同 XML 文件(全部采用 TEI 编码)的代码,以查看特定元素和属性是否出现、出现的频率以及出现的上下文)。为此,我编写了以下代码:
from logging import root
import xml.etree.ElementTree as ET
import csv
f = open('orestes-elements.csv', 'w', encoding="utf-8")
writer = csv.writer(f)
writer.writerow(["Note Attributes", "Note Text", "Responsibility", "Certainty Element", "Certainty Attributes", "Certainty Text"])
tree = ET.parse(r"C:\Users\noahb\OneDrive\Desktop\Humboldt\Semester 2\Daten\Hausarbeit-TEI\edition-euripides\Orestes.xml")
root = tree.getroot()
try:
for note in root.findall('.//note'):
noteat = note.attrib
notetext = note.text
print(noteat)
print(notetext)
#attribute search
for responsibility in root.findall(".//*[@resp]"):
responsibilities = str(responsibility.tag, responsibility.attrib, responsibility.text)
for certainty in root.findall(".//*[@cert]"):
certaintytag = certainty.tag
certaintyat = certainty.attrib
certaintytext = certainty.text
writer.writerow([noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext])
finally:
f.close()
我收到错误“NameError:名称 'noteat' 未定义”。我可以缩进 writer.writerrow 但是来自另一个 for 循环的信息没有被添加。如何从不同的 for 循环中获取信息到我的 CSV 文件中?帮助将不胜感激? (for 循环中的 print() 给了我正确的结果和责任,我试着把它全部变成一个字符串,但这不是必需的,我只是在尝试不同的解决方案 - none 工作到现在)。
这是我的 XML 文件的示例:(某些元素和属性不会出现在某些文件中 - 这可能是错误的原因吗?)
<?xml version="1.0" encoding="UTF-8"?>
<!--<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="grc">-->
<?oxygen RNGSchema="teiScholiaSchema2021beta.rng" type="xml"?>
<TEI xml:lang="grc">
<teiHeader>
<titleStmt>
<title cert="high">Scholia on Euripides’ Orestes 1–500</title>
<author><note>Donald J.</note> Mastronarde</author>
</titleStmt>
</teiHeader>
<text>
<div1 type="subdivisionByPlay" xml:id="Orestes">
<div2 type="hypotheseis" xml:id="hypOrestes">
<head type="outer" xml:lang="en">Prefatory material (argumenta/hypotheseis) for Orestes</head>
<p>Orestes, pursuing <note cert="low">(vengeance for)</note> the murder of his father, killed Aegisthus and
Clytemnestra. Having dared to commit matricide he paid the penalty immediately, becoming
mad. And after Tyndareus, the father of the murdered woman, brought an accusation, the
Argives were about to issue a public vote about him, concerning what the man who had acted
impiously should suffer.
</p>
</div2>
</div1>
</text>
</TEI>
CSV 格式示例:
如果缺少某个元素,您的 writer.writerow()
中的值将不会被定义。您可以定义一些默认值来避免这种情况。
尝试在 try
语句后添加以下内容:
noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext = [''] * 6
如果愿意,您当然可以 'NA'
。