如何用 bs4 替换 html 中的标签

How to replace tag in html with bs4

我有 html 文件带有 2 个标签,link 和文本,我设法替换了 link 但我不知道如何替换标签内的文本.不太懂tags是怎么变的,想了解一下

我的代码:

import requests 
from bs4 import BeautifulSoup 

link = 'http://127.0.0.1:5500/dat.html'
response = requests.get(link).text

with open('parse.html', 'w', encoding= 'utf-8') as file:
    file.write(response)

soup = BeautifulSoup(response, 'lxml')

res = response.replace("https://www.google.com/", "https://reddit.com/")



with open("parse.html", "w") as outf:
    outf.write(res)

html:

<body>
<h1>
    <a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
    <a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>

我需要

 <body>
<h1>
    <a href="https://https://www.reddit.com//" target="_blank">reddit</a>
</h1>
<h1>
    <a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>

您可以找到所有相关的 <a> 标签并更改它们的属性/.string:

from bs4 import BeautifulSoup

html_doc = """
<body>
<h1>
    <a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
    <a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for a in soup.select('a[href*="google.com"]'):
    a["href"] = "https://reddit.com/"
    a.string = "reddit"

print(soup.prettify())

打印:

<body>
 <h1>
  <a href="https://reddit.com/" target="_blank">
   reddit
  </a>
 </h1>
 <h1>
  <a href="https://ru.wikipedia.org/wiki/" target="_blank">
   wiki
  </a>
 </h1>
</body>