Python: 如何将几个 html 文件的内容从 <title> 标签保存到 html link 中?
Python: How can I save the content of several html files, into a html link, from <title> tag?
我有一个 Python 代码可以很好地解析 html 文件中的一些数据。在代码的末尾,我必须按标签保存 html 文件。例如,我有这 3 html 个文件,其中包含 3 个标题标签:
<title>My name is Prince</title>
<title>I love Madonna</title>
<title>Cars and Candies</title>
每一个都必须这样保存:
my-name-is-prince.html
I-love-madonna.html
cars-and-candies.html
所以,我已经有了一些Python的SAVE解决方案,但我不知道如何通过标签保存。
try:
title = re.search('<title.+/title>', html)[0]
title_content = re.search('>(.+)<', title)[1]
except:
pass
with open("my-words.html", "w") as some_file_handle:
some_file_handle.write(finalString)
或
with open('page_323.txt', 'w') as f:
f.write(result.text)
或
with open("somefilename.txt", "w") as some_file_handle:
for line in data:
some_file_handle.write(line + "\n")
P.S。我有 500 个文件。 Python 代码必须从每个 html 中找到每个标签并将它们中的每一个保存到新的 html.
中
更新
你在找那个吗:
# html = """<title>My name is Prince</title>"""
>>> re.search(r'<title>(?P<title>.+)</title>', html).groups('title')[0] \
.replace(' ', '-').lower()
'my-name-is-prince'
旧答案
如果您已经从 html 中提取标题,您可以这样做:
title = 'My name is Prince'
filename = f"{title.lower().replace(' ', '-')}.html"
with open(filename, "w") as some_file_handle:
some_file_handle.write(finalString)
如果您想使用 beautifulsoup,请检查:
soup = soup.encode(formatter=UnsortedAttributes()).decode('utf-8')
new_filename = title.get_text()
new_filename = new_filename.lower()
words = re.findall(r'\w+', new_filename)
new_filename = '-'.join(words)
new_filename = new_filename + '.html'
print(new_filename)
在此处查看完整代码:
我有一个 Python 代码可以很好地解析 html 文件中的一些数据。在代码的末尾,我必须按标签保存 html 文件。例如,我有这 3 html 个文件,其中包含 3 个标题标签:
<title>My name is Prince</title>
<title>I love Madonna</title>
<title>Cars and Candies</title>
每一个都必须这样保存:
my-name-is-prince.html
I-love-madonna.html
cars-and-candies.html
所以,我已经有了一些Python的SAVE解决方案,但我不知道如何通过标签保存。
try:
title = re.search('<title.+/title>', html)[0]
title_content = re.search('>(.+)<', title)[1]
except:
pass
with open("my-words.html", "w") as some_file_handle:
some_file_handle.write(finalString)
或
with open('page_323.txt', 'w') as f:
f.write(result.text)
或
with open("somefilename.txt", "w") as some_file_handle:
for line in data:
some_file_handle.write(line + "\n")
P.S。我有 500 个文件。 Python 代码必须从每个 html 中找到每个标签并将它们中的每一个保存到新的 html.
中更新
你在找那个吗:
# html = """<title>My name is Prince</title>"""
>>> re.search(r'<title>(?P<title>.+)</title>', html).groups('title')[0] \
.replace(' ', '-').lower()
'my-name-is-prince'
旧答案 如果您已经从 html 中提取标题,您可以这样做:
title = 'My name is Prince'
filename = f"{title.lower().replace(' ', '-')}.html"
with open(filename, "w") as some_file_handle:
some_file_handle.write(finalString)
如果您想使用 beautifulsoup,请检查:
soup = soup.encode(formatter=UnsortedAttributes()).decode('utf-8')
new_filename = title.get_text()
new_filename = new_filename.lower()
words = re.findall(r'\w+', new_filename)
new_filename = '-'.join(words)
new_filename = new_filename + '.html'
print(new_filename)
在此处查看完整代码: