注释掉并取消注释 xml 元素
Comment out and uncomment an xml element
我有一个 xml 文件,我想取消注释并注释掉文件中的一个元素。
<my_element>
<blablabla href="docs/MyBlank.htm" />
</my_element>
这个我想"close"(注释掉)这样:
<!--
<my_element>
<blablabla href="docs/MyBlank.htm" />
</my_element>
-->
在文件的更下方,我有一个同名元素 "closed"(已注释掉),如下所示:
<!--
<my_element>
<blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>
-->
我想 "open" 像这样(取消注释):
<my_element>
<blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>
我为此使用 ElementTree,我知道如何编辑元素中的值和属性,但我完全不确定如何删除和添加某个特定元素周围的 <!-- -->
。
您可以使用 BeautifulSoup
进行解析。基本示例:
xmlbody = '<stuff>\
<my_element>\
<blablabla href="docs/MyBlank.htm" />\
</my_element>\
<!--\
<my_element>\
<blablabla href="secretwebhacking/MySecrectBankLogin.htm" />\
</my_element>\
-->\
</stuff>'
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(xmlbody, "lxml")
# Find all comments
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
# Create new soup object from comment contents
commentsoup = BeautifulSoup(comment, "lxml")
# Find the tag we want
blatag = commentsoup.find('blablabla')
# Check if it is the one we need
if(blatag['href']=="secretwebhacking/MySecrectBankLogin.htm"):
# If so, insert the element within the comment into the document
comment.insert_after(commentsoup.find('body').find('my_element'))
# And remove the comment
comment.extract()
# Find all my_elements
my_elements = soup.findAll('my_element')
for tag in my_elements:
# Check if it's the one we want
if(tag.find('blablabla')['href'] == "docs/MyBlank.htm"):
# If so, insert a commented version
tagcomment = soup.new_string(str(tag), Comment)
tag.insert_after(tagcomment)
# And remove the tag
tag.extract()
print(soup.find('html').find('body').prettify().replace("<body>\n","").replace("\n</body>",""))
这应该让你开始,你可以根据需要让它变得更复杂。输出是这样的:
<stuff>
<!--<my_element> <blablabla href="docs/MyBlank.htm"></blablabla></my_element>-->
<my_element>
<blablabla href="secretwebhacking/MySecrectBankLogin.htm">
</blablabla>
</my_element>
</stuff>
我有一个 xml 文件,我想取消注释并注释掉文件中的一个元素。
<my_element>
<blablabla href="docs/MyBlank.htm" />
</my_element>
这个我想"close"(注释掉)这样:
<!--
<my_element>
<blablabla href="docs/MyBlank.htm" />
</my_element>
-->
在文件的更下方,我有一个同名元素 "closed"(已注释掉),如下所示:
<!--
<my_element>
<blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>
-->
我想 "open" 像这样(取消注释):
<my_element>
<blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>
我为此使用 ElementTree,我知道如何编辑元素中的值和属性,但我完全不确定如何删除和添加某个特定元素周围的 <!-- -->
。
您可以使用 BeautifulSoup
进行解析。基本示例:
xmlbody = '<stuff>\
<my_element>\
<blablabla href="docs/MyBlank.htm" />\
</my_element>\
<!--\
<my_element>\
<blablabla href="secretwebhacking/MySecrectBankLogin.htm" />\
</my_element>\
-->\
</stuff>'
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(xmlbody, "lxml")
# Find all comments
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
# Create new soup object from comment contents
commentsoup = BeautifulSoup(comment, "lxml")
# Find the tag we want
blatag = commentsoup.find('blablabla')
# Check if it is the one we need
if(blatag['href']=="secretwebhacking/MySecrectBankLogin.htm"):
# If so, insert the element within the comment into the document
comment.insert_after(commentsoup.find('body').find('my_element'))
# And remove the comment
comment.extract()
# Find all my_elements
my_elements = soup.findAll('my_element')
for tag in my_elements:
# Check if it's the one we want
if(tag.find('blablabla')['href'] == "docs/MyBlank.htm"):
# If so, insert a commented version
tagcomment = soup.new_string(str(tag), Comment)
tag.insert_after(tagcomment)
# And remove the tag
tag.extract()
print(soup.find('html').find('body').prettify().replace("<body>\n","").replace("\n</body>",""))
这应该让你开始,你可以根据需要让它变得更复杂。输出是这样的:
<stuff>
<!--<my_element> <blablabla href="docs/MyBlank.htm"></blablabla></my_element>-->
<my_element>
<blablabla href="secretwebhacking/MySecrectBankLogin.htm">
</blablabla>
</my_element>
</stuff>