Crawler4j 可以 运行 来自另一个 class
Can Crawler4j be run from another class
我需要从另一个 class 调用 Crawler4j。我没有使用 Controller class 中的主要方法,而是使用了一个名为 setup 的简单方法。
class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 1;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(1);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(MyCrawler.class, numberOfCrawlers);
} catch(Exception e) {
e.printStackTrace();
}
}
}
试图在另一个 class 中这样调用它,但出现错误。
Controller c = new Controller();
c.setup(seed);
有没有可能在Controller中没有main方法class而仍然运行 crawler4j.简而言之,我想知道如何将爬虫集成到我已经有一个主要方法的应用程序中。帮助将不胜感激。
应该没有问题运行喜欢的爬虫。下面的代码已经过测试,将按预期工作:
public class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 4;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(2);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(BasicCrawler.class, numberOfCrawlers);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
Controller crawler = new Controller();
crawler.setup("http://www.ics.uci.edu/");
}
}
抱歉,我忘记在 class 名称前放置访问修饰符 "public"。因此错误。谢谢你的回答。
我需要从另一个 class 调用 Crawler4j。我没有使用 Controller class 中的主要方法,而是使用了一个名为 setup 的简单方法。
class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 1;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(1);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(MyCrawler.class, numberOfCrawlers);
} catch(Exception e) {
e.printStackTrace();
}
}
}
试图在另一个 class 中这样调用它,但出现错误。
Controller c = new Controller();
c.setup(seed);
有没有可能在Controller中没有main方法class而仍然运行 crawler4j.简而言之,我想知道如何将爬虫集成到我已经有一个主要方法的应用程序中。帮助将不胜感激。
应该没有问题运行喜欢的爬虫。下面的代码已经过测试,将按预期工作:
public class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 4;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(2);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(BasicCrawler.class, numberOfCrawlers);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
Controller crawler = new Controller();
crawler.setup("http://www.ics.uci.edu/");
}
}
抱歉,我忘记在 class 名称前放置访问修饰符 "public"。因此错误。谢谢你的回答。