如何在没有文件的情况下获得绝对 URL parh
How to get absolute URL parh without files
我需要获取没有文件链接的链接的绝对路径。我有这段代码可以获取链接,但其中缺少一些链接。
public class Main {
public static void main(String[] args) throws Exception {
URI uri = new URI("http://www.niocchi.com/");
printURLofPages(uri);
}
private static void printURLofPages(URI uri) throws IOException {
Document doc = Jsoup.connect(uri.toString()).get();
Elements links = doc.select("a[href~=^[^#]+$]");
for (Element link : links) {
String href = link.attr("abs:href");
URL url = new URL(href);
String path = url.getPath();
int lastdot = path.lastIndexOf(".");
if (lastdot > 0) {
String extension = path.substring(lastdot);
if (!extension.equalsIgnoreCase(".html") && !extension.equalsIgnoreCase(".htm"))
return;
}
System.out.println(href);
}
}
}
此代码为我提供以下链接:
http://www.enormo.com/
http://www.vitalprix.com/
http://www.niocchi.com/javadoc
http://www.niocchi.com/
我需要这个链接:
http://www.enormo.com/
http://www.vitalprix.com/
http://www.niocchi.com/javadoc
http://www.linkedin.com/in/flmommens
http://www.linkedin.com/in/ivanprado
http://www.linkedin.com/in/marcgracia
http://es.linkedin.com/in/tdibaja
http://www.linkody.com
http://www.niocchi.com/
非常感谢您的建议。
而不是
String href = link.attr("href");
尝试
String href = link.attr("abs:href");
编辑 文档:http://jsoup.org/cookbook/extracting-data/working-with-urls
我需要获取没有文件链接的链接的绝对路径。我有这段代码可以获取链接,但其中缺少一些链接。
public class Main {
public static void main(String[] args) throws Exception {
URI uri = new URI("http://www.niocchi.com/");
printURLofPages(uri);
}
private static void printURLofPages(URI uri) throws IOException {
Document doc = Jsoup.connect(uri.toString()).get();
Elements links = doc.select("a[href~=^[^#]+$]");
for (Element link : links) {
String href = link.attr("abs:href");
URL url = new URL(href);
String path = url.getPath();
int lastdot = path.lastIndexOf(".");
if (lastdot > 0) {
String extension = path.substring(lastdot);
if (!extension.equalsIgnoreCase(".html") && !extension.equalsIgnoreCase(".htm"))
return;
}
System.out.println(href);
}
}
}
此代码为我提供以下链接:
http://www.enormo.com/
http://www.vitalprix.com/
http://www.niocchi.com/javadoc
http://www.niocchi.com/
我需要这个链接:
http://www.enormo.com/
http://www.vitalprix.com/
http://www.niocchi.com/javadoc
http://www.linkedin.com/in/flmommens
http://www.linkedin.com/in/ivanprado
http://www.linkedin.com/in/marcgracia
http://es.linkedin.com/in/tdibaja
http://www.linkody.com
http://www.niocchi.com/
非常感谢您的建议。
而不是
String href = link.attr("href");
尝试
String href = link.attr("abs:href");
编辑 文档:http://jsoup.org/cookbook/extracting-data/working-with-urls