检索包括动态加载的完整网页 links/images
Retrieving a complete webpage including dynamically loaded links/images
问题
正在下载可动态加载 links/images 的网站的完整离线工作副本
研究
有问题(例如[1], , [3]) on Whosebug addressing this issue, most of which have the top answers using wget or httrack, both of which fail miserably (please do correct me if I am wrong) on pages that dyanmically load links or uses srcset
instead of src
for img
tag -or anything loaded via JS-. A rather obvious solution was Selenium, however, if you ever used Selenium in production, you quickly start seeing the issues that arise from such a decision (resource heavy, quite complex to use head-full driver, the fact that is it not built for that), that being said, there are people claiming to have been using it easily in production for years
预期解决方案
一个脚本(最好在 python 中),它解析页面的链接并分别加载它们。我似乎无法找到执行此操作的任何现有脚本。如果您的解决方案是 "so implement your own",那么一开始就问这个问题是没有意义的,我正在寻找现有的实现。
例子
- Shopify.com
- 使用 Wix 构建的网站
问题
正在下载可动态加载 links/images 的网站的完整离线工作副本
研究
有问题(例如[1], srcset
instead of src
for img
tag -or anything loaded via JS-. A rather obvious solution was Selenium, however, if you ever used Selenium in production, you quickly start seeing the issues that arise from such a decision (resource heavy, quite complex to use head-full driver, the fact that is it not built for that), that being said, there are people claiming to have been using it easily in production for years
预期解决方案
一个脚本(最好在 python 中),它解析页面的链接并分别加载它们。我似乎无法找到执行此操作的任何现有脚本。如果您的解决方案是 "so implement your own",那么一开始就问这个问题是没有意义的,我正在寻找现有的实现。
例子
- Shopify.com
- 使用 Wix 构建的网站