Python 删除 Span 标签并覆盖 Txt 文件

Question

我想在 ping 之前从文本文档中删除 span 标签，否则它将失败，但我无法让它删除 span 标签并在没有标签的情况下再次保存文件或保存新结果成数组保存。

from bs4 import BeautifulSoup

with open(r'sitelist.txt') as f:
    f = f.read().splitlines()

soup = BeautifulSoup(f,"html.parser")

while len(soup.find_all('span')) > 0:
    soup.span.extract()

f = soup

return f

我尝试过分解或解包，但无法得到我想要的结果。

Answer 1

啊...str.splitlines() return 一个列表，你不能只在列表上使用 BeautifulSoup()。相反，只需将 f = f.read().splitlines() 替换为 f = f.read().

然后，您的代码就可以工作了，您只需要将输出写入文件就可以了吗？

from bs4 import BeautifulSoup

with open(r'sitelist.txt') as f:
    f = f.read()

soup = BeautifulSoup(f, "html.parser")

while len(soup.find_all('span')) > 0:
    soup.span.extract()

with open(r'sitelist.txt', 'w') as f:
    f.write(str(soup))

Answer 2

如上所述，您不需要使用readline()，只需使用read()。我不确定 extract 是否有效，是吗？这是我的解决方案，它只是删除了 span 标签（我想这就是你问的）：

from bs4 import BeautifulSoup

with open('sitelist.txt', 'r') as html:
    soup = BeautifulSoup(myfile,"html.parser")
    for match in soup.findAll('span'): 
        match.unwrap()

with open('sitelist.txt', 'w') as html:
    html.write(str(soup))

我确定有一种方法可以打开文件进行读写，但我只是打开并重新打开文件两次。

Python 删除 Span 标签并覆盖 Txt 文件

Python Remove Span Tags and Overwrite Txt File

html

python

beautifulsoup

bs4