使用 JAVA 将 window.open(超链接)Javascript 代码转换为纯绝对 url
Converting window.open(Hyperlink) Javascript code to pure absolute url with JAVA
我在使用 JAVA Jsoup 库的网站上工作以提取一些超链接
Document doc = Jsoup.connect("http://www.saudisale.com/SS_a_mpg.aspx").get();
Elements script = doc.select("script") ;
for(Element elementary :doc.select("table"))
{
System.out.println(""+elementary.select("tbody").select("tr").select("td").select("input").attr("onClick")+"");
示例输出:-
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
基于Jsoup不支持javascript的事实,所以我不得不做一些手动java代码来转换window.open(超链接)javascript绝对超链接的代码
例如下面的输出JavaScript代码必须被转换
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode=1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')
收件人:
http://saudisale.com/arPrivatePage.aspx?id=21871638
和
window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
到
http://www.saudisale.com/SS_a_car.aspx?carid=37149
有人可以指导我如何使用 JAVA 完成这项任务吗?
使用正则表达式。这会做你想做的事:
String input = "window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');";
String regex = "window.open\(['\"]*(.*?)(\s*['\"]*,.*?)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String output = (matcher.group().replaceAll(regex, ""));
System.out.println(output);
}
您的最后两个网址是 相对网址,因此您必须按照 here.
所述将它们转换为绝对网址
对于亲戚 URl 我使用了这段代码。它工作正常。
String input2 = "window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')";
URL baseURL = new URL("http://saudisale.com/");
String regex = "window.open\(['\"]*(.*?)(\s*['\"]*,.*?)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input2);
while (matcher.find()) {
String output = (matcher.group().replaceAll(regex, ""));
URL url = new URL( baseURL ,output);
System.out.println(url);
}
我在使用 JAVA Jsoup 库的网站上工作以提取一些超链接
Document doc = Jsoup.connect("http://www.saudisale.com/SS_a_mpg.aspx").get();
Elements script = doc.select("script") ;
for(Element elementary :doc.select("table"))
{
System.out.println(""+elementary.select("tbody").select("tr").select("td").select("input").attr("onClick")+"");
示例输出:-
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
基于Jsoup不支持javascript的事实,所以我不得不做一些手动java代码来转换window.open(超链接)javascript绝对超链接的代码
例如下面的输出JavaScript代码必须被转换
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode=1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')
收件人: http://saudisale.com/arPrivatePage.aspx?id=21871638
和
window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
到 http://www.saudisale.com/SS_a_car.aspx?carid=37149
有人可以指导我如何使用 JAVA 完成这项任务吗?
使用正则表达式。这会做你想做的事:
String input = "window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');";
String regex = "window.open\(['\"]*(.*?)(\s*['\"]*,.*?)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String output = (matcher.group().replaceAll(regex, ""));
System.out.println(output);
}
您的最后两个网址是 相对网址,因此您必须按照 here.
所述将它们转换为绝对网址对于亲戚 URl 我使用了这段代码。它工作正常。
String input2 = "window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')";
URL baseURL = new URL("http://saudisale.com/");
String regex = "window.open\(['\"]*(.*?)(\s*['\"]*,.*?)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input2);
while (matcher.find()) {
String output = (matcher.group().replaceAll(regex, ""));
URL url = new URL( baseURL ,output);
System.out.println(url);
}