使用Selenium + Java robot + Scrapbook 下载一个完整的网页

Question

我正在尝试通过 Selenium、java 机器人 class 和 ScrapBook firefox 扩展程序下载完整的网页（包括图像、css、scipts 等） . 我首先尝试了机器人的 CTRL+S 方法，但没有用，因为无法按 ENTER。所以我决定在 firefox 上安装 ScrapBook（有一种方法不显示下载对话框）并尝试使用普通机器人 class，效果很好。但我需要将它与 selenium 一起使用以进行快速测试。所以这是源代码：

package selenium.inexample;

import java.awt.*;
import java.awt.event.KeyEvent;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.firefox.FirefoxProfile;
import org.openqa.selenium.firefox.internal.ProfilesIni;

public class SelRobot {

public static void main(String[] args) throws AWTException {

// I use my firefox profile for using scrapbook
    ProfilesIni profile = new ProfilesIni();
    FirefoxProfile myprofile = profile.getProfile("default");

    WebDriver driver = new FirefoxDriver(myprofile);
    driver.manage().window().maximize();
    driver.get("https://www.google.com.");

// This is added only for waiting for the page to load
    driver.getPageSource();

//Robot uses the keyboard for interacting with browser and using scrapbook
    Robot robot = new Robot();  
    robot.keyPress(KeyEvent.VK_ALT);
    robot.keyRelease(KeyEvent.VK_ALT);

    for(int i=0; i<5; i++) {
        robot.keyPress(KeyEvent.VK_RIGHT);
        robot.keyRelease(KeyEvent.VK_RIGHT);
    }
    robot.keyPress(KeyEvent.VK_ENTER);
    robot.keyRelease(KeyEvent.VK_ENTER);
    robot.keyPress(KeyEvent.VK_ENTER);
    robot.keyRelease(KeyEvent.VK_ENTER);
}

}

问题是剪贴簿将页面作为临时文件夹下载到 local\temp\result，然后在会话过期时删除所有子文件夹和文件。我需要将它下载为 firefox 配置文件文件夹中的永久页面。页面加载后，机器人会直接使用剪贴簿。我在代码中使用我的 firefox 配置文件，因为它安装了剪贴簿扩展。我想指出普通机器人（带剪贴簿）工作正常，但问题是当我使用 Selenium 时（我需要做一些配置吗？）。另外，请注意代码只是一个简单的示例。如果可行，我将在生产中使用类似的东西。

Answer 1

我找到了解决办法！！！问题是剪贴簿总是将页面下载到个人资料文件夹。当我使用 Selenium 时，总是会创建一个临时配置文件，即使我创建了一个新配置文件或使用了一个已经存在的配置文件。因此，我只是将剪贴簿配置为将页面下载到与配置文件不同的任何其他文件夹。只需按 Alt+k，然后按工具 > 选项 > 组织。我刚刚更改了下载文件夹。这样，selenium 就不会删除页面，因为它不是临时配置文件数据的一部分。

使用Selenium + Java robot + Scrapbook 下载一个完整的网页

Using Selenium + Java robot + Scrapbook to download a complete web page

java

firefox

selenium

awtrobot