如何将文件中的文本插入新的 XML 标签
How to insert text from file into new XML tags
我有以下代码来尝试解析一个 XML 文件,以便它从外部文本文件(如果找到)中读取并将其内容插入到新引入的标签中并保存一个新的 XML包含结果操作的文件。
代码如下所示:
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
import os
# define our data file
data_file = 'test2_of_2016-09-19.xml'
tree = ET.ElementTree(file=data_file)
root = tree.getroot()
for element in root:
if element.find('File_directory') is not None:
directory = element.find('File_directory').text
if element.find('Introduction') is not None:
introduction = element.find('Introduction').text
if element.find('Directions') is not None:
directions = element.find('Directions').text
for element in root:
if element.find('File_directory') is not None:
if element.find('Introduction') is not None:
intro_tree = directory+introduction
with open(intro_tree, 'r') as f:
intro_text = f.read()
f.closed
intro_body = ET.SubElement(element,'Introduction_Body')
intro_body.text = intro_text
if element.find('Directions') is not None:
directions_tree = directory+directions
with open(directions_tree, 'r') as f:
directions_text = f.read()
f.closed
directions_body = ET.SubElement(element,'Directions_Body')
directions_body.text = directions_text
tree.write('new_' + data_file)
问题是,似乎 file_directory、介绍和方向的最后一个实例被保存并分散到多个条目,这是不需要的,因为每个条目都有自己的单独记录,因此说话。
源 XML 文件如下所示:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
</Row>
</Root>
所需的输出 XML 应如下所示:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/introduction-bridalveil-fall.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/directions-bridalveil-fall.html</Directions_Body>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
</Root>
但我最终得到的是:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
</Root>
顺便说一句,有没有什么方法可以在不将所有内容都打印在一行上(为了便于阅读)的情况下引入 body 标签的内容?
第一个 for
循环遍历文档的 Row
元素,为 directory
、introduction
和 directions
变量分配新值分别在每次迭代中以最后出现的 Row
元素的值结束。
我会做的是创建一个 字典 来将标签名称映射到文本内容,然后使用该映射动态添加新的子元素。示例(不读取引用文件):
for row in root:
elements = {}
for node in row:
elements[node.tag] = node.text
directory = elements['File_directory']
intro_tree = directory + elements['Introduction']
intro_body = ET.SubElement(row, 'Introduction_Body')
intro_body.text = 'Text from %s' % intro_tree
directions_tree = directory + elements['Directions']
directions_body = ET.SubElement(row, 'Directions_Body')
directions_body.text = 'Text from %s' % directions_tree
我有以下代码来尝试解析一个 XML 文件,以便它从外部文本文件(如果找到)中读取并将其内容插入到新引入的标签中并保存一个新的 XML包含结果操作的文件。
代码如下所示:
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
import os
# define our data file
data_file = 'test2_of_2016-09-19.xml'
tree = ET.ElementTree(file=data_file)
root = tree.getroot()
for element in root:
if element.find('File_directory') is not None:
directory = element.find('File_directory').text
if element.find('Introduction') is not None:
introduction = element.find('Introduction').text
if element.find('Directions') is not None:
directions = element.find('Directions').text
for element in root:
if element.find('File_directory') is not None:
if element.find('Introduction') is not None:
intro_tree = directory+introduction
with open(intro_tree, 'r') as f:
intro_text = f.read()
f.closed
intro_body = ET.SubElement(element,'Introduction_Body')
intro_body.text = intro_text
if element.find('Directions') is not None:
directions_tree = directory+directions
with open(directions_tree, 'r') as f:
directions_text = f.read()
f.closed
directions_body = ET.SubElement(element,'Directions_Body')
directions_body.text = directions_text
tree.write('new_' + data_file)
问题是,似乎 file_directory、介绍和方向的最后一个实例被保存并分散到多个条目,这是不需要的,因为每个条目都有自己的单独记录,因此说话。
源 XML 文件如下所示:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
</Row>
</Root>
所需的输出 XML 应如下所示:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/introduction-bridalveil-fall.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/directions-bridalveil-fall.html</Directions_Body>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
</Root>
但我最终得到的是:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
</Root>
顺便说一句,有没有什么方法可以在不将所有内容都打印在一行上(为了便于阅读)的情况下引入 body 标签的内容?
第一个 for
循环遍历文档的 Row
元素,为 directory
、introduction
和 directions
变量分配新值分别在每次迭代中以最后出现的 Row
元素的值结束。
我会做的是创建一个 字典 来将标签名称映射到文本内容,然后使用该映射动态添加新的子元素。示例(不读取引用文件):
for row in root:
elements = {}
for node in row:
elements[node.tag] = node.text
directory = elements['File_directory']
intro_tree = directory + elements['Introduction']
intro_body = ET.SubElement(row, 'Introduction_Body')
intro_body.text = 'Text from %s' % intro_tree
directions_tree = directory + elements['Directions']
directions_body = ET.SubElement(row, 'Directions_Body')
directions_body.text = 'Text from %s' % directions_tree