如何将 XML 标签内容修改正确反馈给 BeautifulSoup?

How to feed XML tag content modifications back to BeautifulSoup correctly?

我在使用 BeautifulSoup 编辑 XML 文件时遇到一些问题。我在 Whosebug 和其他地方发现了大量类似的主题,但是 none 已经解决了这个特定的场景。我正在编辑一个本质上是列表的标签的内容。编辑部分效果不错,就是不知道怎么把修改后的内容正确发回soup

这是我目前的情况:

from bs4 import BeautifulSoup

XMLsource = """<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <dataTag>
    <id>14</id>
    <players>var:val : var1:val1 : var2:val2 : testr:testl</players>
    <active>0</active>
  </dataTag>
  <dataTag>
    <id>15</id>
    <players>1var:1val : 1var1:1val1 : 1var2:1val2 : 1testr:1testl</players>
    <active></active>
  </dataTag>
  <dataTag>
    <id>16</id>
    <players>2var:2val : 2var1:2val1 : this_var:some_val : 2var2:2val2 : 2testr:2testl</players>
    <active>1</active>
  </dataTag>
  <dataTag>
    <id>17</id>
    <players>3var:3val : 3var1:3val1 : 3var2:3val2 : 3testr:3testl</players>
    <active>1</active>
  </dataTag>
</root>
"""

myarray = []
searchList = ['string', 'string_1', 'string_2', 'string_3', 'this_']
delimiter = " : "

def listToString(lst):
    return (delimiter.join(lst))

soup = BeautifulSoup(XMLsource, 'xml')

for a in soup.find_all('players'):
    for each in a:
        tagContents = a.string
        goodContents = list(tagContents.split(" : "))
        length = len(goodContents)
        for element in range(length):
            e = goodContents[element]
            if e != '':
                myarray.append(e)
        for stuff in searchList:
            for i, elem in enumerate(myarray):
                if stuff in elem:
                    myarray.remove(elem)
    a.string = listToString(myarray)

print(soup)

问题是a.string = listToString(myarray)。它只是在 for 主循环的每个循环中不断向汤中添加 myarray 。所以 <players> 标签的内容堆叠起来。你可以运行代码看看我的意思。这是一个伪代码,它不会改变标签内容,所以问题是什么更清楚。

在Internetz 上测试和搜索已经三天了。我将代码更改了一百万次,但我不是专业的程序员,所以我的代码通常是一个反复试验的过程,而这次我就是无法破解它。谁能帮我修一下代码好吗?

所以在几个不眠之夜之后,我尝试了一种不同的方法,你瞧,它奏效了!所以这里是为了防止有人需要这样的脚本。那里可能仍然有一些多余的代码,但它做了它应该做的事情。它从任何定义的 XML 标签中删除列表中定义的字符串,如果 XML 标签内容是一个列表,它也可以这样做。

from bs4 import BeautifulSoup

def listToString(lst):
    return (delimiter.join(lst))

XMLsource = """<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <dataTag>
    <id>14</id>
    <players>var:val : var1:val1 : var2:val2 : testr:testl</players>
    <active>0</active>
  </dataTag>
  <dataTag>
    <id>15</id>
    <players>1var:1val : 1var1:1val1 : 1var2:1val2 : 1testr:1testl</players>
    <active></active>
  </dataTag>
  <dataTag>
    <id>16</id>
    <players>2var:2val : 2var1:2val1 : 2var2:2val2 : 2testr:2testl</players>
    <active>1</active>
  </dataTag>
  <dataTag>
    <id>17</id>
    <players>3var:3val : this_var:some_val : 3var1:3val1 : 3var2:3val2 : 3testr:3testl</players>
    <active>1</active>
  </dataTag>
</root>
"""

searchList = ['string', 'string_1', 'string_2', 'testr', 'this_v']

soup = BeautifulSoup(XMLsource, 'xml')
gamers = soup.find_all('players')
loopnum = len(gamers)
delimiter = " : "
newContents = []
badContents = []
goodContents = []
ii = 0

for xx in range(loopnum):
    newContents.clear()
    goodContents.clear()

    goodContents = gamers[xx].string
    goodContents = list(goodContents.split(" : "))

    for tagy in gamers[xx]:
        badContents.clear()

        for element in goodContents:
            e = element

            for stuff in searchList:
                if e != '':

                    if stuff in e:
                        badContents.append(e)

        newContents = [x for x in goodContents if x not in badContents]


    gamers[ii].string = listToString(newContents)
    ii += 1

print(soup)