Java 搜索关键字的网络爬虫
Java Webcrawler that searches for keyword
我正在尝试让网络爬虫 return 在网页上找到给定的词时为真。 return true 语句从未运行过,所以我无法正确使用它。任何人都有一个简单的方法来做到这一点?谢谢
public static boolean keywordSearch(String url, String keyword){
String strTemp = "";
try {
URL my_url = new URL(url);
BufferedReader br = new BufferedReader(new InputStreamReader(my_url.openStream()));
while(null != (strTemp = br.readLine())){
if (strTemp.contains(keyword)){
return true;
}
}
} catch (Exception ex) {
System.out.println("Error: " + ex.getMessage());
}
return false;
}
首先使用如下方法读入 URL 的内容:
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
然后检查 URL 的内容是否包含这样的关键字:
String content = URLConnectionReader.getText("http://www.someurl.com/page.html");
if(content.contains("someKeyword"))
{
// content of url contains keyword
}
我正在尝试让网络爬虫 return 在网页上找到给定的词时为真。 return true 语句从未运行过,所以我无法正确使用它。任何人都有一个简单的方法来做到这一点?谢谢
public static boolean keywordSearch(String url, String keyword){
String strTemp = "";
try {
URL my_url = new URL(url);
BufferedReader br = new BufferedReader(new InputStreamReader(my_url.openStream()));
while(null != (strTemp = br.readLine())){
if (strTemp.contains(keyword)){
return true;
}
}
} catch (Exception ex) {
System.out.println("Error: " + ex.getMessage());
}
return false;
}
首先使用如下方法读入 URL 的内容:
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
然后检查 URL 的内容是否包含这样的关键字:
String content = URLConnectionReader.getText("http://www.someurl.com/page.html");
if(content.contains("someKeyword"))
{
// content of url contains keyword
}