Crawler4j 运行时错误
Crawler4j runtime error
我已经使用 crawler4j 库实现了一个网络爬虫。
我遇到以下错误:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
我在Google上搜索错误,发现缺少slf4j库。所以我下载了它并将其添加到项目中,之后我得到了下面快照中显示的错误:
class的代码如下:
import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.crawler.CrawlController;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;
public class Controller {
public static void main(String[] args) throws Exception {
String crawlStorageFolder = "/DA Project/Crawled Data";
int numberOfCrawlers = 7;
CrawlConfig config = new CrawlConfig();
/*
* You can set the location of the folder where you want your crawled
* data to be stored
*/
config.setCrawlStorageFolder(crawlStorageFolder);
/*
* Be polite: Make sure that we don't send more than 1 request per
* second (1000 milliseconds between requests).
*/
config.setPolitenessDelay(1000);
/*
* You can set the maximum crawl depth here. The default value is -1 for
* unlimited depth
*/
config.setMaxDepthOfCrawling(-1);
/*
* You can set the maximum number of pages to crawl. The default value
* is -1 for unlimited number of pages
*/
config.setMaxPagesToFetch(-1);
/*
* This config parameter can be used to set your crawl to be resumable
* (meaning that you can resume the crawl from a previously
* interrupted/crashed crawl). Note: if you enable resuming feature and
* want to start a fresh crawl, you need to delete the contents of
* rootFolder manually.
*/
config.setResumableCrawling(false);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig,
pageFetcher);
try {
CrawlController controller = new CrawlController(config,
pageFetcher, robotstxtServer);
/*
* For each crawl, you need to add some seed urls. These are the
* first URLs that are fetched and then the crawler starts following
* links which are found in these pages
*/
controller
.addSeed("http://www.consumercomplaints.in/?search=chevrolet");
/*
* Start the crawl. This is a blocking operation, meaning that your
* code will reach the line after this only when crawling is
* finished.
*/
controller.start(MyCrawler.class, numberOfCrawlers);
} catch (Exception e) {
System.out.println("Caught Exception :" + e.getMessage());
e.printStackTrace();
}
}
}
如有任何帮助,我们将不胜感激。
谢谢!
正如您通过向项目添加相关 jar 解决了 org.slf4j.impl.StaticLoggerBinder 的错误一样,您现在需要对 ch.qos.logback.core.joran.spi.JoranException
执行相同的操作
我删除了 SLF4J jar 文件并下载了 logback 1.1.2 jar 文件并将其添加到我的项目中。
logback API 的 link 是:http://logback.qos.ch/download.html
包括的罐子有:
logback-access-1.1.2
logback-access-1.1.2-sources
logback-classic-1.1.2
logback-classic-1.1.2-sources
logback-core-1.1.2
logback-core-1.1.2-sources
希望其他人受益。
谢谢。
添加此依赖项解决了 'Failed to load class org.slf4j.impl.StaticLoggerBinder' 错误,同时使用 (edu.uci.ics) crawler4j 版本 4.2
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.21</version>
</dependency>
我已经使用 crawler4j 库实现了一个网络爬虫。 我遇到以下错误:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
我在Google上搜索错误,发现缺少slf4j库。所以我下载了它并将其添加到项目中,之后我得到了下面快照中显示的错误:
class的代码如下:
import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.crawler.CrawlController;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;
public class Controller {
public static void main(String[] args) throws Exception {
String crawlStorageFolder = "/DA Project/Crawled Data";
int numberOfCrawlers = 7;
CrawlConfig config = new CrawlConfig();
/*
* You can set the location of the folder where you want your crawled
* data to be stored
*/
config.setCrawlStorageFolder(crawlStorageFolder);
/*
* Be polite: Make sure that we don't send more than 1 request per
* second (1000 milliseconds between requests).
*/
config.setPolitenessDelay(1000);
/*
* You can set the maximum crawl depth here. The default value is -1 for
* unlimited depth
*/
config.setMaxDepthOfCrawling(-1);
/*
* You can set the maximum number of pages to crawl. The default value
* is -1 for unlimited number of pages
*/
config.setMaxPagesToFetch(-1);
/*
* This config parameter can be used to set your crawl to be resumable
* (meaning that you can resume the crawl from a previously
* interrupted/crashed crawl). Note: if you enable resuming feature and
* want to start a fresh crawl, you need to delete the contents of
* rootFolder manually.
*/
config.setResumableCrawling(false);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig,
pageFetcher);
try {
CrawlController controller = new CrawlController(config,
pageFetcher, robotstxtServer);
/*
* For each crawl, you need to add some seed urls. These are the
* first URLs that are fetched and then the crawler starts following
* links which are found in these pages
*/
controller
.addSeed("http://www.consumercomplaints.in/?search=chevrolet");
/*
* Start the crawl. This is a blocking operation, meaning that your
* code will reach the line after this only when crawling is
* finished.
*/
controller.start(MyCrawler.class, numberOfCrawlers);
} catch (Exception e) {
System.out.println("Caught Exception :" + e.getMessage());
e.printStackTrace();
}
}
}
如有任何帮助,我们将不胜感激。 谢谢!
正如您通过向项目添加相关 jar 解决了 org.slf4j.impl.StaticLoggerBinder 的错误一样,您现在需要对 ch.qos.logback.core.joran.spi.JoranException
执行相同的操作我删除了 SLF4J jar 文件并下载了 logback 1.1.2 jar 文件并将其添加到我的项目中。
logback API 的 link 是:http://logback.qos.ch/download.html
包括的罐子有:
logback-access-1.1.2
logback-access-1.1.2-sources
logback-classic-1.1.2
logback-classic-1.1.2-sources
logback-core-1.1.2
logback-core-1.1.2-sources
希望其他人受益。 谢谢。
添加此依赖项解决了 'Failed to load class org.slf4j.impl.StaticLoggerBinder' 错误,同时使用 (edu.uci.ics) crawler4j 版本 4.2
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.21</version>
</dependency>