python:如何保存动态渲染的html网页代码

Question

我有一个设置，其中通过发送加载一些脚本（主要是 d3 代码）的套接字来动态更改本地服务器 (localhost:8080) 中的网页。在 chrome 中，我可以检查页面的 "rendered html status"，即 d3/javascript 加载代码的结果 html 代码。现在，我需要保存呈现的网页的 "full html snapshot"，以便以后能够以 "static" 的方式查看它。我在 python 中尝试了很多解决方案，这些解决方案可以很好地加载网络并保存其 "on-load" d3/javascript 处理过的内容，但不要获取有关生成的代码的信息 "after"加载。如果找不到 python 解决方案，我也可以使用 javascript 来实现。

请记住，我需要在选定的时间检索 "dynamically" 及时修改的完整 html 呈现代码。

以下是在 Whosebug 中找到的相关但未回答此问题的问题列表。没有回答： How to save dynamically changed HTML? 已回答但未针对动态更改 html： Using PyQt4 to return Javascript generated HTML 没有回答： How to save dynamically added data to update the page (using jQuery) 不是动态的： Python to Save Web Pages

Answer 1

这个问题可以使用 selenium-python 来解决（感谢@Juca 建议使用 selenium）。一旦安装 (pip install selenium)，这段代码就可以解决问题：

from selenium import webdriver
# initiate the browser. It will open the url, 
# and we can access all its content, and make actions on it. 
browser = webdriver.Firefox()
url = 'http://localhost:8080/test.html'
# the page test.html is changing constantly its content by receiving sockets, etc. 
#So we need to save its "status" when we decide for further retrieval)
browser.get(url)
# wait until we want to save the content (this could be a buttonUI action, etc.):
raw_input("Press to print web page")  
# save the html rendered content in that moment: 
html_source = browser.page_source
# display to check: 
print html_source

python:如何保存动态渲染的html网页代码

python: how to save dynamically rendered html web page code

javascript

python

pyqt4

d3.js