下载整个网页并将它们另存为 html 文件 urllib.request

Download entire web pages and save them as html file with urllib.request

我可以使用这些代码保存多个网页;但是,将它们保存为 html 后,我看不到正确的网站视图。比如table里面的文字打滑,看不到图片。 我需要下载整个页面,就像我们在任何网络浏览器中保存一样,这样我才能看到正确的视图。

import urllib.request

url= 'https://asd.com/asdID='
for i in range(1, 5):
    print('     --> ID:', i)
    newurl = url + str(i)
    f = open(str(i)+'.html', 'w')
    page = urllib.request.urlopen(newurl)
    pagetext = str(page.read())
    f.write(pagetext)
    f.close()

您可以改用 selenium 来很好地下载完整的网站 只需运行下面的代码

from selenium import webdriver
#Download the chrome driver from the link below and specify the path of chromedriver
#https://chromedriver.storage.googleapis.com/index.html?path=2.40/
chromedriver = 'C:/python36/chromedriver.exe'
url= 'https://asd.com/asdID='
for i in range(1, 5):
    browser = webdriver.Chrome(chromedriver)
    browser.get(url + str(i))
    data = browser.page_source
    with open("webpage%s.html" %(str(i)), "w+") as f:
        f.write(data)

更新

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import ahk

firefox = FirefoxBinary("C:\Program Files (x86)\Mozilla Firefox\firefox.exe")
from selenium import webdriver

driver = web.Firefox(firefox_binary=firefox)
driver.get("http://www.yahoo.com")
ahk.start()
ahk.ready()
ahk.execute("Send,^s")
ahk.execute("WinWaitActive, Save As,,2")
ahk.execute("WinActivate, Save As")
ahk.execute("Send, C:\path\to\file.htm")
ahk.execute("Send, {Enter}")

你现在将得到一切