在 python 中读取 2 个单词之间的文件文本

read the text of a file between 2 words in python

我正在尝试打开、阅读和提取 .xml 通过我在另一个新的 .xml 中引入并只写那个片段(在 2 个标签之间)的关键字。xml 我生成的。

目前我拥有的 python 脚本允许我打开,阅读源 .xml 文件,搜索我在文中介绍的关键字和 return 那些完整的通过将关键字写入我生成的新 .xml 文件中找到关键字的行,如下所示:

keyword = 'Georgia'
occurrences = []
with open('test_input.xml') as lines:
    for line in lines:
        if keyword in line:
            occurrences.append(line)

archi1=open("test_output.xml","w") 
archi1.write(''.join(occurrences))
archi1.close() 

我得到的结果是一个“test_output.xml”文件,其中包含以下内容:

     <id>Georgia-1</id>
         <profile>Georgia-p1</profile>
     <id>Georgia-2</id>
         <profile>Georgia-p2</profile>

问题是我不仅需要它 return 包含关键字的完整行(在本例中为 'Georgia'),而且还需要包含这两个词的整个片段在单词或标记'profile'的开始和结束之间进行分隔,即我需要它return以下结果:

<profile>
    <id>Georgia-1</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>Georgia-p1</profile>
        <showtitle>Georgia_s1</showtitle>
        <ip>000.000.0.3</ip>
        <port>00003</port>
        <persistencePort>00033</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_3</webstart.server.name>
        <codebaseProtocolServer>T3</codebaseProtocolServer>
    </properties>
</profile>
<profile>
    <id>Georgia-2</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>Georgia-p2</profile>
        <showtitle>Georgia_s2</showtitle>
        <ip>000.000.0.4</ip>
        <port>00004</port>
        <persistencePort>00044</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_4</webstart.server.name>
        <codebaseProtocolServer>T4</codebaseProtocolServer>
    </properties>
</profile>

我使用的完整源.xml如下:

<project>       

    
<profile>
    <id>Azerbaiyan-1</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>Azerbaiyan-p1</profile>
        <showtitle>Azerbaiyan_s1</showtitle>
        <ip>000.000.0.1</ip>
        <port>00001</port>
        <persistencePort>00011</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_1</webstart.server.name>
        <codebaseProtocolServer>T1</codebaseProtocolServer>
    </properties>
</profile>

<profile>
    <id>Azerbaiyan-2</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>Azerbaiyan-p2</profile>
        <showtitle>Azerbaiyan_s2</showtitle>
        <ip>000.000.0.2</ip>
        <port>00002</port>
        <persistencePort>00022</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_2</webstart.server.name>
        <codebaseProtocolServer>T2</codebaseProtocolServer>
    </properties>
</profile>

<profile>
    <id>Georgia-1</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>Georgia-p1</profile>
        <showtitle>Georgia_s1</showtitle>
        <ip>000.000.0.3</ip>
        <port>00003</port>
        <persistencePort>00033</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_3</webstart.server.name>
        <codebaseProtocolServer>T3</codebaseProtocolServer>
    </properties>
</profile>
<profile>
    <id>Georgia-2</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>Georgia-p2</profile>
        <showtitle>Georgia_s2</showtitle>
        <ip>000.000.0.4</ip>
        <port>00004</port>
        <persistencePort>00044</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_4</webstart.server.name>
        <codebaseProtocolServer>T4</codebaseProtocolServer>
    </properties>
</profile>

<profile>
    <id>USA-1</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>USA-p1</profile>
        <showtitle>USA1_s1</showtitle>
        <ip>000.000.0.5</ip>
        <port>00005</port>
        <persistencePort>00055</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_5</webstart.server.name>
        <codebaseProtocolServer>T5</codebaseProtocolServer>
    </properties>
</profile>

<profile>
    <id>USA-2</id>
    <activation>
        <activeByDefault>false</activeByDefault>
    </activation>
    <properties>
        <profile>USA-p2</profile>
        <showtitle>USA1_s2</showtitle>
        <ip>000.000.0.6</ip>
        <port>00006</port>
        <persistencePort>00066</persistencePort>
        <defaultLocale>en_GB</defaultLocale>
        <webstart.server.name>host_6</webstart.server.name>
        <codebaseProtocolServer>T6</codebaseProtocolServer>
    </properties>
</profile>

将输入解析为 XML 并捕获具有 id 子元素的 profile 元素,其文本值包含字符串“Georgia”。

以下程序使用 ElementTree 标准库并输出想要的结果:

import xml.etree.ElementTree as ET

tree = ET.parse("input.xml")

# Iterate over all 'profile' elements
for profile in tree.findall("profile"):
    id = profile.find("id").text
    if "Georgia" in id:
        print(ET.tostring(profile).decode())