在 Python 中使用 lxml,我需要在输入 xml 文件中用 <mark>RNA</mark> 替换 "RNA"。下面的代码
Using lxml in Python, I need to replace "RNA" with <mark>RNA</mark> in input xml file. Code below
My input XML file is:
<?xml version='1.0' encoding='UTF-8'?>
<try>
something somethingRNA and RNA in RNA.
</try>
My Python Code:
import lxml.etree as ET
import openpyxl
import re
url = 'output_15012015_test.xml'
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try)
string = abstract[0].text
if(abstract):
anotherString = re.sub(r'\b'+lncrna.lower()+'\b', '<mark>'+lncrna+'</mark>', string.lower())
abstract[0].text = anotherString
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)
Output
我得到以下替换文本而不是 <mark>RNA</mark>
<mark>RNA</mark>
I think it has to do with tree.write() method. Also I'm new to Python and the community. Please help me with this.
您在元素 .text
中设置了 XML 标记,因此当写入 XML 时,它被解释为文本,而不是标记,并且字符使用 [=12= 进行转义].
您想做的是:
- 将
.text
分为三个部分:新标签之前,新标签中,
新标签后
- 添加新标签并设置文本和尾巴
见代码:
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try')
aList = re.split(r'(\b'+lncrna+r'\b)', abstract[0].text, flags=re.IGNORECASE)
abstract[0].text = aList[0]
for i in range(1,len(aList),2):
anElement = ET.SubElement(abstract[0], 'mark')
anElement.text = aList[i]
anElement.tail = aList[i+1]
abstract[0].insert( (i-1)/2, anElement )
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)
My input XML file is:
<?xml version='1.0' encoding='UTF-8'?>
<try>
something somethingRNA and RNA in RNA.
</try>
My Python Code:
import lxml.etree as ET
import openpyxl
import re
url = 'output_15012015_test.xml'
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try)
string = abstract[0].text
if(abstract):
anotherString = re.sub(r'\b'+lncrna.lower()+'\b', '<mark>'+lncrna+'</mark>', string.lower())
abstract[0].text = anotherString
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)
Output
我得到以下替换文本而不是 <mark>RNA</mark>
<mark>RNA</mark>
I think it has to do with tree.write() method. Also I'm new to Python and the community. Please help me with this.
您在元素 .text
中设置了 XML 标记,因此当写入 XML 时,它被解释为文本,而不是标记,并且字符使用 [=12= 进行转义].
您想做的是:
- 将
.text
分为三个部分:新标签之前,新标签中, 新标签后 - 添加新标签并设置文本和尾巴
见代码:
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try')
aList = re.split(r'(\b'+lncrna+r'\b)', abstract[0].text, flags=re.IGNORECASE)
abstract[0].text = aList[0]
for i in range(1,len(aList),2):
anElement = ET.SubElement(abstract[0], 'mark')
anElement.text = aList[i]
anElement.tail = aList[i+1]
abstract[0].insert( (i-1)/2, anElement )
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)