Crawler4j 可以 运行 来自另一个 class

Can Crawler4j be run from another class

我需要从另一个 class 调用 Crawler4j。我没有使用 Controller class 中的主要方法,而是使用了一个名为 setup 的简单方法。

class Controller {
public void setup(String seed) {
    try {
        String rootFolder = "data/crawler";
        int numberOfCrawlers = 1;
        CrawlConfig config = new CrawlConfig();
        config.setCrawlStorageFolder(rootFolder);
        config.setPolitenessDelay(300);
        config.setMaxDepthOfCrawling(1);

        PageFetcher pageFetcher = new PageFetcher(config);
        RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
        RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
        CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

        controller.addSeed(seed);
        controller.setCustomData(seed);
        controller.start(MyCrawler.class, numberOfCrawlers);
    } catch(Exception e) {
        e.printStackTrace();
    }
}

}

试图在另一个 class 中这样调用它,但出现错误。

Controller c = new Controller();
c.setup(seed);

有没有可能在Controller中没有main方法class而仍然运行 crawler4j.简而言之,我想知道如何将爬虫集成到我已经有一个主要方法的应用程序中。帮助将不胜感激。

应该没有问题运行喜欢的爬虫。下面的代码已经过测试,将按预期工作:

public class Controller {

    public void setup(String seed) {
        try {
            String rootFolder = "data/crawler";
            int numberOfCrawlers = 4;
            CrawlConfig config = new CrawlConfig();
            config.setCrawlStorageFolder(rootFolder);
            config.setPolitenessDelay(300);
            config.setMaxDepthOfCrawling(2);

            PageFetcher pageFetcher = new PageFetcher(config);
            RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
            RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
            CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

            controller.addSeed(seed);
            controller.setCustomData(seed);
            controller.start(BasicCrawler.class, numberOfCrawlers);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws Exception {
        Controller crawler = new Controller();
        crawler.setup("http://www.ics.uci.edu/");
    }
}

抱歉,我忘记在 class 名称前放置访问修饰符 "public"。因此错误。谢谢你的回答。