从 python 中的字符串中检测并替换 xml
Detecting and replacing xml from string in python
我有一个文件,其中包含文本以及转储到其中的一些 xml 内容。它看起来像这样:
The authentication details : <id>70016683</id><password>password@123</password>
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request>
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
我正在使用 python 程序来解析这个文件。我想用占位符替换 xml 部分:xml_obj。输出应如下所示:
The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj
同时我也想把被替换的xml文本提取出来存到一个列表中。如果该行没有 xml 对象,则该列表应包含 None。
- 我已经尝试使用正则表达式来达到这个目的:
xml_tag = re.search(r"<\w*>",line)
if xml_tag:
start_position = xml_tag.start()
xml_word = xml_tag.group()[:1]+'/'+xml_tag.group()[1:]
xml_pattern = r'{}'.format(xml_word)
stop_position = re.search(xml_pattern,line).stop()
但此代码仅检索一个 xml 标记的开始和停止位置,它是第一行的内容和最后一行的完整格式(在输入文件中)。我想获取所有 xml 内容而不考虑 xml 结构并将其替换为 'xml_obj'.
任何建议都会有所帮助。提前致谢。
编辑:
我也想将相同的逻辑应用于如下所示的文件:
The authentication details : ID <id>70016683</id> Password <password>password@123</password> Authentication details complete
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request> Request successful
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
上述文件一行中可能有多个xml对象。
他们在 xml 部分之后可能还有一些纯文本。
以下内容有点令人费解,但假设您的实际文本已由问题中的示例正确表示,请尝试以下操作:
txt = """[your sample text above]"""
lines = txt.splitlines()
entries = []
new_txt = ''
for line in lines:
entry = (line.replace(' <',' xxx<',1).split('xxx'))
if len(entry)==2:
entries.append(entry[1])
entry[1]="xml_obj"
line=''.join(entry)
else:
entries.append('none')
new_txt+=line+'\n'
for entry in entries:
print(entry)
print('---')
print(new_txt)
输出:
<id>70016683</id><password>password@123</password>
none
<request><id>90016133</id><password>password@3212</password></request>
<Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
---
The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj
我有一个文件,其中包含文本以及转储到其中的一些 xml 内容。它看起来像这样:
The authentication details : <id>70016683</id><password>password@123</password>
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request>
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
我正在使用 python 程序来解析这个文件。我想用占位符替换 xml 部分:xml_obj。输出应如下所示:
The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj
同时我也想把被替换的xml文本提取出来存到一个列表中。如果该行没有 xml 对象,则该列表应包含 None。
- 我已经尝试使用正则表达式来达到这个目的:
xml_tag = re.search(r"<\w*>",line)
if xml_tag:
start_position = xml_tag.start()
xml_word = xml_tag.group()[:1]+'/'+xml_tag.group()[1:]
xml_pattern = r'{}'.format(xml_word)
stop_position = re.search(xml_pattern,line).stop()
但此代码仅检索一个 xml 标记的开始和停止位置,它是第一行的内容和最后一行的完整格式(在输入文件中)。我想获取所有 xml 内容而不考虑 xml 结构并将其替换为 'xml_obj'.
任何建议都会有所帮助。提前致谢。
编辑:
我也想将相同的逻辑应用于如下所示的文件:
The authentication details : ID <id>70016683</id> Password <password>password@123</password> Authentication details complete
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request> Request successful
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
上述文件一行中可能有多个xml对象。
他们在 xml 部分之后可能还有一些纯文本。
以下内容有点令人费解,但假设您的实际文本已由问题中的示例正确表示,请尝试以下操作:
txt = """[your sample text above]"""
lines = txt.splitlines()
entries = []
new_txt = ''
for line in lines:
entry = (line.replace(' <',' xxx<',1).split('xxx'))
if len(entry)==2:
entries.append(entry[1])
entry[1]="xml_obj"
line=''.join(entry)
else:
entries.append('none')
new_txt+=line+'\n'
for entry in entries:
print(entry)
print('---')
print(new_txt)
输出:
<id>70016683</id><password>password@123</password>
none
<request><id>90016133</id><password>password@3212</password></request>
<Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
---
The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj