如何用 bs4 替换 html 中的标签
How to replace tag in html with bs4
我有 html 文件带有 2 个标签,link 和文本,我设法替换了 link 但我不知道如何替换标签内的文本.不太懂tags是怎么变的,想了解一下
我的代码:
import requests
from bs4 import BeautifulSoup
link = 'http://127.0.0.1:5500/dat.html'
response = requests.get(link).text
with open('parse.html', 'w', encoding= 'utf-8') as file:
file.write(response)
soup = BeautifulSoup(response, 'lxml')
res = response.replace("https://www.google.com/", "https://reddit.com/")
with open("parse.html", "w") as outf:
outf.write(res)
html:
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
我需要
<body>
<h1>
<a href="https://https://www.reddit.com//" target="_blank">reddit</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
您可以找到所有相关的 <a>
标签并更改它们的属性/.string
:
from bs4 import BeautifulSoup
html_doc = """
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for a in soup.select('a[href*="google.com"]'):
a["href"] = "https://reddit.com/"
a.string = "reddit"
print(soup.prettify())
打印:
<body>
<h1>
<a href="https://reddit.com/" target="_blank">
reddit
</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">
wiki
</a>
</h1>
</body>
我有 html 文件带有 2 个标签,link 和文本,我设法替换了 link 但我不知道如何替换标签内的文本.不太懂tags是怎么变的,想了解一下
我的代码:
import requests
from bs4 import BeautifulSoup
link = 'http://127.0.0.1:5500/dat.html'
response = requests.get(link).text
with open('parse.html', 'w', encoding= 'utf-8') as file:
file.write(response)
soup = BeautifulSoup(response, 'lxml')
res = response.replace("https://www.google.com/", "https://reddit.com/")
with open("parse.html", "w") as outf:
outf.write(res)
html:
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
我需要
<body>
<h1>
<a href="https://https://www.reddit.com//" target="_blank">reddit</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
您可以找到所有相关的 <a>
标签并更改它们的属性/.string
:
from bs4 import BeautifulSoup
html_doc = """
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for a in soup.select('a[href*="google.com"]'):
a["href"] = "https://reddit.com/"
a.string = "reddit"
print(soup.prettify())
打印:
<body>
<h1>
<a href="https://reddit.com/" target="_blank">
reddit
</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">
wiki
</a>
</h1>
</body>