下载整个网页并将它们另存为 html 文件 urllib.request
Download entire web pages and save them as html file with urllib.request
我可以使用这些代码保存多个网页;但是,将它们保存为 html 后,我看不到正确的网站视图。比如table里面的文字打滑,看不到图片。
我需要下载整个页面,就像我们在任何网络浏览器中保存一样,这样我才能看到正确的视图。
import urllib.request
url= 'https://asd.com/asdID='
for i in range(1, 5):
print(' --> ID:', i)
newurl = url + str(i)
f = open(str(i)+'.html', 'w')
page = urllib.request.urlopen(newurl)
pagetext = str(page.read())
f.write(pagetext)
f.close()
您可以改用 selenium 来很好地下载完整的网站
只需运行下面的代码
from selenium import webdriver
#Download the chrome driver from the link below and specify the path of chromedriver
#https://chromedriver.storage.googleapis.com/index.html?path=2.40/
chromedriver = 'C:/python36/chromedriver.exe'
url= 'https://asd.com/asdID='
for i in range(1, 5):
browser = webdriver.Chrome(chromedriver)
browser.get(url + str(i))
data = browser.page_source
with open("webpage%s.html" %(str(i)), "w+") as f:
f.write(data)
更新
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import ahk
firefox = FirefoxBinary("C:\Program Files (x86)\Mozilla Firefox\firefox.exe")
from selenium import webdriver
driver = web.Firefox(firefox_binary=firefox)
driver.get("http://www.yahoo.com")
ahk.start()
ahk.ready()
ahk.execute("Send,^s")
ahk.execute("WinWaitActive, Save As,,2")
ahk.execute("WinActivate, Save As")
ahk.execute("Send, C:\path\to\file.htm")
ahk.execute("Send, {Enter}")
你现在将得到一切
我可以使用这些代码保存多个网页;但是,将它们保存为 html 后,我看不到正确的网站视图。比如table里面的文字打滑,看不到图片。 我需要下载整个页面,就像我们在任何网络浏览器中保存一样,这样我才能看到正确的视图。
import urllib.request
url= 'https://asd.com/asdID='
for i in range(1, 5):
print(' --> ID:', i)
newurl = url + str(i)
f = open(str(i)+'.html', 'w')
page = urllib.request.urlopen(newurl)
pagetext = str(page.read())
f.write(pagetext)
f.close()
您可以改用 selenium 来很好地下载完整的网站 只需运行下面的代码
from selenium import webdriver
#Download the chrome driver from the link below and specify the path of chromedriver
#https://chromedriver.storage.googleapis.com/index.html?path=2.40/
chromedriver = 'C:/python36/chromedriver.exe'
url= 'https://asd.com/asdID='
for i in range(1, 5):
browser = webdriver.Chrome(chromedriver)
browser.get(url + str(i))
data = browser.page_source
with open("webpage%s.html" %(str(i)), "w+") as f:
f.write(data)
更新
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import ahk
firefox = FirefoxBinary("C:\Program Files (x86)\Mozilla Firefox\firefox.exe")
from selenium import webdriver
driver = web.Firefox(firefox_binary=firefox)
driver.get("http://www.yahoo.com")
ahk.start()
ahk.ready()
ahk.execute("Send,^s")
ahk.execute("WinWaitActive, Save As,,2")
ahk.execute("WinActivate, Save As")
ahk.execute("Send, C:\path\to\file.htm")
ahk.execute("Send, {Enter}")
你现在将得到一切