Java 使用 Jsoup 抓取需要登录的网站
Java scrape website with login required using Jsoup
我想从 streetinsider.com 打印一些数据(div 和 class="news_article")。我创建了一个帐户,我需要登录才能访问这些数据。
谁能解释一下为什么这段代码不起作用?我已经尝试了很多,但没有任何效果。
public static final String SPLIT_INTERNET_URL = "http://www.streetinsider.com/Special+Dividends?offset=55";
public static final String SPLIT_LOGIN = "https://www.streetinsider.com/login.php";
/**
* @param args the command line arguments
* @throws java.io.FileNotFoundException
* @throws java.io.UnsupportedEncodingException
* @throws java.text.ParseException
* @throws java.lang.ClassNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException, IOException, ParseException, ClassNotFoundException {
// TODO code application logic here
Response res = Jsoup.connect(SPLIT_LOGIN)
.data("loginemail", "XXXXX", "password", "XXXX")
.method(Method.POST)
.execute();
Document doc = res.parse();
Map<String, String> cookies = res.cookies();
Document pageWhenAlreadyLoggedIn = Jsoup.connect(SPLIT_INTERNET_URL).cookies(cookies).get();
Elements elems = pageWhenAlreadyLoggedIn.select("div[class=news_article]");
for (Element elem : elems) {
System.out.println(elem);
}
}
您的代码无法让您登录网站....请尝试使用以下代码登录网站。
要登录网站:
Connection.Response res = Jsoup.connect(SPLIT_LOGIN)
.data("action", "account",
"redirect", "account_home.php?",
"radiobutton", "old",
"loginemail", "XXXXX",
"password", "XXXXX",
"LoginChoice", "Sign In to Secure Area")
.method(Connection.Method.POST)
.followRedirects(true)
.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
.execute();
所以您现在已经登录了,但是该网站似乎检测您是否在其他浏览器或连接中登录,要求您先终止该连接。下面是终止连接的代码:
Connection.Response res2 = Jsoup.connect("http://www.streetinsider.com/login_duplicate.php")
.data("ok", "End Prior Session")
.method(Connection.Method.POST)
.cookies(res.cookies())
.followRedirects(true)
.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
.execute();
很好,现在 res2
将包含您帐户的主页,然后您可以继续转到您想要的任何页面。有关如何使用 Jsoup
登录网站的更多信息,请查看以下教程:
我想从 streetinsider.com 打印一些数据(div 和 class="news_article")。我创建了一个帐户,我需要登录才能访问这些数据。
谁能解释一下为什么这段代码不起作用?我已经尝试了很多,但没有任何效果。
public static final String SPLIT_INTERNET_URL = "http://www.streetinsider.com/Special+Dividends?offset=55";
public static final String SPLIT_LOGIN = "https://www.streetinsider.com/login.php";
/**
* @param args the command line arguments
* @throws java.io.FileNotFoundException
* @throws java.io.UnsupportedEncodingException
* @throws java.text.ParseException
* @throws java.lang.ClassNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException, IOException, ParseException, ClassNotFoundException {
// TODO code application logic here
Response res = Jsoup.connect(SPLIT_LOGIN)
.data("loginemail", "XXXXX", "password", "XXXX")
.method(Method.POST)
.execute();
Document doc = res.parse();
Map<String, String> cookies = res.cookies();
Document pageWhenAlreadyLoggedIn = Jsoup.connect(SPLIT_INTERNET_URL).cookies(cookies).get();
Elements elems = pageWhenAlreadyLoggedIn.select("div[class=news_article]");
for (Element elem : elems) {
System.out.println(elem);
}
}
您的代码无法让您登录网站....请尝试使用以下代码登录网站。
要登录网站:
Connection.Response res = Jsoup.connect(SPLIT_LOGIN)
.data("action", "account",
"redirect", "account_home.php?",
"radiobutton", "old",
"loginemail", "XXXXX",
"password", "XXXXX",
"LoginChoice", "Sign In to Secure Area")
.method(Connection.Method.POST)
.followRedirects(true)
.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
.execute();
所以您现在已经登录了,但是该网站似乎检测您是否在其他浏览器或连接中登录,要求您先终止该连接。下面是终止连接的代码:
Connection.Response res2 = Jsoup.connect("http://www.streetinsider.com/login_duplicate.php")
.data("ok", "End Prior Session")
.method(Connection.Method.POST)
.cookies(res.cookies())
.followRedirects(true)
.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
.execute();
很好,现在 res2
将包含您帐户的主页,然后您可以继续转到您想要的任何页面。有关如何使用 Jsoup
登录网站的更多信息,请查看以下教程: