如何为所需的附加输出正确格式化代码?
How do I properly format code for desired appending output?
我正在编写新代码,但无法获得所需的输出。该代码读取 html 文件并查找标签。它只输出 url 。我插入额外的代码来完成 link。我试图在字符串中插入 url 两次。
####### Parse for <a> tags and save ############
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
links = []
for link in soup2.findAll('a', attrs={'href': re.compile("^https://")}):
links.append('<a href="'+link.get('href')+'">'"{link}"'</a><br>')
time.sleep(.1)
with open("page-2.html", 'w') as html:
html.write('{links}\n'.format(links=links))
我猜这让我想要,但不完全是。我宁愿看到它写成“https://whatever.com/text/text/”也不愿看到"whatever.com/text/text"
####### Parse for <a> tags and save ############
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
links = []
for link in soup2.findAll('a', attrs={'href': re.compile("^https://")}):
links.append('{0}</a><br>'.format(link,link))
with open("page-2.html", 'w') as html:
html.write('{links}\n'.format(links=links))
这应该会为您提供所需的 html 输出文件:
import re
from bs4 import BeautifulSoup
import html
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
with open("page2.html", 'w') as h:
for link in soup2.find_all('a'):
h.write("<a href=\"{}\">{}</a><br>".format(link.get('href'),link.get('href')))
我正在编写新代码,但无法获得所需的输出。该代码读取 html 文件并查找标签。它只输出 url 。我插入额外的代码来完成 link。我试图在字符串中插入 url 两次。
####### Parse for <a> tags and save ############
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
links = []
for link in soup2.findAll('a', attrs={'href': re.compile("^https://")}):
links.append('<a href="'+link.get('href')+'">'"{link}"'</a><br>')
time.sleep(.1)
with open("page-2.html", 'w') as html:
html.write('{links}\n'.format(links=links))
我猜这让我想要,但不完全是。我宁愿看到它写成“https://whatever.com/text/text/”也不愿看到"whatever.com/text/text"
####### Parse for <a> tags and save ############
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
links = []
for link in soup2.findAll('a', attrs={'href': re.compile("^https://")}):
links.append('{0}</a><br>'.format(link,link))
with open("page-2.html", 'w') as html:
html.write('{links}\n'.format(links=links))
这应该会为您提供所需的 html 输出文件:
import re
from bs4 import BeautifulSoup
import html
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
with open("page2.html", 'w') as h:
for link in soup2.find_all('a'):
h.write("<a href=\"{}\">{}</a><br>".format(link.get('href'),link.get('href')))