如何将 XML 标签内容修改正确反馈给 BeautifulSoup?
How to feed XML tag content modifications back to BeautifulSoup correctly?
我在使用 BeautifulSoup 编辑 XML 文件时遇到一些问题。我在 Whosebug 和其他地方发现了大量类似的主题,但是 none 已经解决了这个特定的场景。我正在编辑一个本质上是列表的标签的内容。编辑部分效果不错,就是不知道怎么把修改后的内容正确发回soup
这是我目前的情况:
from bs4 import BeautifulSoup
XMLsource = """<?xml version="1.0" encoding="UTF-8" ?>
<root>
<dataTag>
<id>14</id>
<players>var:val : var1:val1 : var2:val2 : testr:testl</players>
<active>0</active>
</dataTag>
<dataTag>
<id>15</id>
<players>1var:1val : 1var1:1val1 : 1var2:1val2 : 1testr:1testl</players>
<active></active>
</dataTag>
<dataTag>
<id>16</id>
<players>2var:2val : 2var1:2val1 : this_var:some_val : 2var2:2val2 : 2testr:2testl</players>
<active>1</active>
</dataTag>
<dataTag>
<id>17</id>
<players>3var:3val : 3var1:3val1 : 3var2:3val2 : 3testr:3testl</players>
<active>1</active>
</dataTag>
</root>
"""
myarray = []
searchList = ['string', 'string_1', 'string_2', 'string_3', 'this_']
delimiter = " : "
def listToString(lst):
return (delimiter.join(lst))
soup = BeautifulSoup(XMLsource, 'xml')
for a in soup.find_all('players'):
for each in a:
tagContents = a.string
goodContents = list(tagContents.split(" : "))
length = len(goodContents)
for element in range(length):
e = goodContents[element]
if e != '':
myarray.append(e)
for stuff in searchList:
for i, elem in enumerate(myarray):
if stuff in elem:
myarray.remove(elem)
a.string = listToString(myarray)
print(soup)
问题是a.string = listToString(myarray)
。它只是在 for
主循环的每个循环中不断向汤中添加 myarray
。所以 <players>
标签的内容堆叠起来。你可以运行代码看看我的意思。这是一个伪代码,它不会改变标签内容,所以问题是什么更清楚。
在Internetz 上测试和搜索已经三天了。我将代码更改了一百万次,但我不是专业的程序员,所以我的代码通常是一个反复试验的过程,而这次我就是无法破解它。谁能帮我修一下代码好吗?
所以在几个不眠之夜之后,我尝试了一种不同的方法,你瞧,它奏效了!所以这里是为了防止有人需要这样的脚本。那里可能仍然有一些多余的代码,但它做了它应该做的事情。它从任何定义的 XML 标签中删除列表中定义的字符串,如果 XML 标签内容是一个列表,它也可以这样做。
from bs4 import BeautifulSoup
def listToString(lst):
return (delimiter.join(lst))
XMLsource = """<?xml version="1.0" encoding="UTF-8" ?>
<root>
<dataTag>
<id>14</id>
<players>var:val : var1:val1 : var2:val2 : testr:testl</players>
<active>0</active>
</dataTag>
<dataTag>
<id>15</id>
<players>1var:1val : 1var1:1val1 : 1var2:1val2 : 1testr:1testl</players>
<active></active>
</dataTag>
<dataTag>
<id>16</id>
<players>2var:2val : 2var1:2val1 : 2var2:2val2 : 2testr:2testl</players>
<active>1</active>
</dataTag>
<dataTag>
<id>17</id>
<players>3var:3val : this_var:some_val : 3var1:3val1 : 3var2:3val2 : 3testr:3testl</players>
<active>1</active>
</dataTag>
</root>
"""
searchList = ['string', 'string_1', 'string_2', 'testr', 'this_v']
soup = BeautifulSoup(XMLsource, 'xml')
gamers = soup.find_all('players')
loopnum = len(gamers)
delimiter = " : "
newContents = []
badContents = []
goodContents = []
ii = 0
for xx in range(loopnum):
newContents.clear()
goodContents.clear()
goodContents = gamers[xx].string
goodContents = list(goodContents.split(" : "))
for tagy in gamers[xx]:
badContents.clear()
for element in goodContents:
e = element
for stuff in searchList:
if e != '':
if stuff in e:
badContents.append(e)
newContents = [x for x in goodContents if x not in badContents]
gamers[ii].string = listToString(newContents)
ii += 1
print(soup)
我在使用 BeautifulSoup 编辑 XML 文件时遇到一些问题。我在 Whosebug 和其他地方发现了大量类似的主题,但是 none 已经解决了这个特定的场景。我正在编辑一个本质上是列表的标签的内容。编辑部分效果不错,就是不知道怎么把修改后的内容正确发回soup
这是我目前的情况:
from bs4 import BeautifulSoup
XMLsource = """<?xml version="1.0" encoding="UTF-8" ?>
<root>
<dataTag>
<id>14</id>
<players>var:val : var1:val1 : var2:val2 : testr:testl</players>
<active>0</active>
</dataTag>
<dataTag>
<id>15</id>
<players>1var:1val : 1var1:1val1 : 1var2:1val2 : 1testr:1testl</players>
<active></active>
</dataTag>
<dataTag>
<id>16</id>
<players>2var:2val : 2var1:2val1 : this_var:some_val : 2var2:2val2 : 2testr:2testl</players>
<active>1</active>
</dataTag>
<dataTag>
<id>17</id>
<players>3var:3val : 3var1:3val1 : 3var2:3val2 : 3testr:3testl</players>
<active>1</active>
</dataTag>
</root>
"""
myarray = []
searchList = ['string', 'string_1', 'string_2', 'string_3', 'this_']
delimiter = " : "
def listToString(lst):
return (delimiter.join(lst))
soup = BeautifulSoup(XMLsource, 'xml')
for a in soup.find_all('players'):
for each in a:
tagContents = a.string
goodContents = list(tagContents.split(" : "))
length = len(goodContents)
for element in range(length):
e = goodContents[element]
if e != '':
myarray.append(e)
for stuff in searchList:
for i, elem in enumerate(myarray):
if stuff in elem:
myarray.remove(elem)
a.string = listToString(myarray)
print(soup)
问题是a.string = listToString(myarray)
。它只是在 for
主循环的每个循环中不断向汤中添加 myarray
。所以 <players>
标签的内容堆叠起来。你可以运行代码看看我的意思。这是一个伪代码,它不会改变标签内容,所以问题是什么更清楚。
在Internetz 上测试和搜索已经三天了。我将代码更改了一百万次,但我不是专业的程序员,所以我的代码通常是一个反复试验的过程,而这次我就是无法破解它。谁能帮我修一下代码好吗?
所以在几个不眠之夜之后,我尝试了一种不同的方法,你瞧,它奏效了!所以这里是为了防止有人需要这样的脚本。那里可能仍然有一些多余的代码,但它做了它应该做的事情。它从任何定义的 XML 标签中删除列表中定义的字符串,如果 XML 标签内容是一个列表,它也可以这样做。
from bs4 import BeautifulSoup
def listToString(lst):
return (delimiter.join(lst))
XMLsource = """<?xml version="1.0" encoding="UTF-8" ?>
<root>
<dataTag>
<id>14</id>
<players>var:val : var1:val1 : var2:val2 : testr:testl</players>
<active>0</active>
</dataTag>
<dataTag>
<id>15</id>
<players>1var:1val : 1var1:1val1 : 1var2:1val2 : 1testr:1testl</players>
<active></active>
</dataTag>
<dataTag>
<id>16</id>
<players>2var:2val : 2var1:2val1 : 2var2:2val2 : 2testr:2testl</players>
<active>1</active>
</dataTag>
<dataTag>
<id>17</id>
<players>3var:3val : this_var:some_val : 3var1:3val1 : 3var2:3val2 : 3testr:3testl</players>
<active>1</active>
</dataTag>
</root>
"""
searchList = ['string', 'string_1', 'string_2', 'testr', 'this_v']
soup = BeautifulSoup(XMLsource, 'xml')
gamers = soup.find_all('players')
loopnum = len(gamers)
delimiter = " : "
newContents = []
badContents = []
goodContents = []
ii = 0
for xx in range(loopnum):
newContents.clear()
goodContents.clear()
goodContents = gamers[xx].string
goodContents = list(goodContents.split(" : "))
for tagy in gamers[xx]:
badContents.clear()
for element in goodContents:
e = element
for stuff in searchList:
if e != '':
if stuff in e:
badContents.append(e)
newContents = [x for x in goodContents if x not in badContents]
gamers[ii].string = listToString(newContents)
ii += 1
print(soup)