在 python 中读取 2 个单词之间的文件文本
read the text of a file between 2 words in python
我正在尝试打开、阅读和提取 .xml 通过我在另一个新的 .xml 中引入并只写那个片段(在 2 个标签之间)的关键字。xml 我生成的。
目前我拥有的 python 脚本允许我打开,阅读源 .xml 文件,搜索我在文中介绍的关键字和 return 那些完整的通过将关键字写入我生成的新 .xml 文件中找到关键字的行,如下所示:
keyword = 'Georgia'
occurrences = []
with open('test_input.xml') as lines:
for line in lines:
if keyword in line:
occurrences.append(line)
archi1=open("test_output.xml","w")
archi1.write(''.join(occurrences))
archi1.close()
我得到的结果是一个“test_output.xml”文件,其中包含以下内容:
<id>Georgia-1</id>
<profile>Georgia-p1</profile>
<id>Georgia-2</id>
<profile>Georgia-p2</profile>
问题是我不仅需要它 return 包含关键字的完整行(在本例中为 'Georgia'),而且还需要包含这两个词的整个片段在单词或标记'profile'的开始和结束之间进行分隔,即我需要它return以下结果:
<profile>
<id>Georgia-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p1</profile>
<showtitle>Georgia_s1</showtitle>
<ip>000.000.0.3</ip>
<port>00003</port>
<persistencePort>00033</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_3</webstart.server.name>
<codebaseProtocolServer>T3</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p2</profile>
<showtitle>Georgia_s2</showtitle>
<ip>000.000.0.4</ip>
<port>00004</port>
<persistencePort>00044</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_4</webstart.server.name>
<codebaseProtocolServer>T4</codebaseProtocolServer>
</properties>
</profile>
我使用的完整源.xml如下:
<project>
<profile>
<id>Azerbaiyan-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Azerbaiyan-p1</profile>
<showtitle>Azerbaiyan_s1</showtitle>
<ip>000.000.0.1</ip>
<port>00001</port>
<persistencePort>00011</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_1</webstart.server.name>
<codebaseProtocolServer>T1</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Azerbaiyan-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Azerbaiyan-p2</profile>
<showtitle>Azerbaiyan_s2</showtitle>
<ip>000.000.0.2</ip>
<port>00002</port>
<persistencePort>00022</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_2</webstart.server.name>
<codebaseProtocolServer>T2</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p1</profile>
<showtitle>Georgia_s1</showtitle>
<ip>000.000.0.3</ip>
<port>00003</port>
<persistencePort>00033</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_3</webstart.server.name>
<codebaseProtocolServer>T3</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p2</profile>
<showtitle>Georgia_s2</showtitle>
<ip>000.000.0.4</ip>
<port>00004</port>
<persistencePort>00044</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_4</webstart.server.name>
<codebaseProtocolServer>T4</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>USA-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>USA-p1</profile>
<showtitle>USA1_s1</showtitle>
<ip>000.000.0.5</ip>
<port>00005</port>
<persistencePort>00055</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_5</webstart.server.name>
<codebaseProtocolServer>T5</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>USA-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>USA-p2</profile>
<showtitle>USA1_s2</showtitle>
<ip>000.000.0.6</ip>
<port>00006</port>
<persistencePort>00066</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_6</webstart.server.name>
<codebaseProtocolServer>T6</codebaseProtocolServer>
</properties>
</profile>
将输入解析为 XML 并捕获具有 id
子元素的 profile
元素,其文本值包含字符串“Georgia”。
以下程序使用 ElementTree 标准库并输出想要的结果:
import xml.etree.ElementTree as ET
tree = ET.parse("input.xml")
# Iterate over all 'profile' elements
for profile in tree.findall("profile"):
id = profile.find("id").text
if "Georgia" in id:
print(ET.tostring(profile).decode())
我正在尝试打开、阅读和提取 .xml 通过我在另一个新的 .xml 中引入并只写那个片段(在 2 个标签之间)的关键字。xml 我生成的。
目前我拥有的 python 脚本允许我打开,阅读源 .xml 文件,搜索我在文中介绍的关键字和 return 那些完整的通过将关键字写入我生成的新 .xml 文件中找到关键字的行,如下所示:
keyword = 'Georgia'
occurrences = []
with open('test_input.xml') as lines:
for line in lines:
if keyword in line:
occurrences.append(line)
archi1=open("test_output.xml","w")
archi1.write(''.join(occurrences))
archi1.close()
我得到的结果是一个“test_output.xml”文件,其中包含以下内容:
<id>Georgia-1</id>
<profile>Georgia-p1</profile>
<id>Georgia-2</id>
<profile>Georgia-p2</profile>
问题是我不仅需要它 return 包含关键字的完整行(在本例中为 'Georgia'),而且还需要包含这两个词的整个片段在单词或标记'profile'的开始和结束之间进行分隔,即我需要它return以下结果:
<profile>
<id>Georgia-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p1</profile>
<showtitle>Georgia_s1</showtitle>
<ip>000.000.0.3</ip>
<port>00003</port>
<persistencePort>00033</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_3</webstart.server.name>
<codebaseProtocolServer>T3</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p2</profile>
<showtitle>Georgia_s2</showtitle>
<ip>000.000.0.4</ip>
<port>00004</port>
<persistencePort>00044</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_4</webstart.server.name>
<codebaseProtocolServer>T4</codebaseProtocolServer>
</properties>
</profile>
我使用的完整源.xml如下:
<project>
<profile>
<id>Azerbaiyan-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Azerbaiyan-p1</profile>
<showtitle>Azerbaiyan_s1</showtitle>
<ip>000.000.0.1</ip>
<port>00001</port>
<persistencePort>00011</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_1</webstart.server.name>
<codebaseProtocolServer>T1</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Azerbaiyan-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Azerbaiyan-p2</profile>
<showtitle>Azerbaiyan_s2</showtitle>
<ip>000.000.0.2</ip>
<port>00002</port>
<persistencePort>00022</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_2</webstart.server.name>
<codebaseProtocolServer>T2</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p1</profile>
<showtitle>Georgia_s1</showtitle>
<ip>000.000.0.3</ip>
<port>00003</port>
<persistencePort>00033</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_3</webstart.server.name>
<codebaseProtocolServer>T3</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p2</profile>
<showtitle>Georgia_s2</showtitle>
<ip>000.000.0.4</ip>
<port>00004</port>
<persistencePort>00044</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_4</webstart.server.name>
<codebaseProtocolServer>T4</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>USA-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>USA-p1</profile>
<showtitle>USA1_s1</showtitle>
<ip>000.000.0.5</ip>
<port>00005</port>
<persistencePort>00055</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_5</webstart.server.name>
<codebaseProtocolServer>T5</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>USA-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>USA-p2</profile>
<showtitle>USA1_s2</showtitle>
<ip>000.000.0.6</ip>
<port>00006</port>
<persistencePort>00066</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_6</webstart.server.name>
<codebaseProtocolServer>T6</codebaseProtocolServer>
</properties>
</profile>
将输入解析为 XML 并捕获具有 id
子元素的 profile
元素,其文本值包含字符串“Georgia”。
以下程序使用 ElementTree 标准库并输出想要的结果:
import xml.etree.ElementTree as ET
tree = ET.parse("input.xml")
# Iterate over all 'profile' elements
for profile in tree.findall("profile"):
id = profile.find("id").text
if "Georgia" in id:
print(ET.tostring(profile).decode())