Jsoup 使用表单登录 (post)
Jsoup login with form (post)
在 reading some examples 之后,我想通过登录实现一个 helpshift 爬虫,例如:
https://target.helpshift.com/login/?next=%2Fadmin%2Fissues%2F
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class JsouptTest {
public static void main(String[] args) throws Exception {
int x = 1;
Connection.Response loginForm = Jsoup.connect("https://target.helpshift.com/login/?next=%2Fadmin%2Fissues%2F" + x + "%2F")
.method(Connection.Method.GET)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0")
.execute();
Document document = Jsoup.connect("https://target.helpshift.com/login/")
.data("cookieexists", "false")
.data("username", "email@example.com")
.data("password", "123456")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0")
.cookies(loginForm.cookies())
.post();
System.out.println(document);
}
}
但是,我收到此错误:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error
fetching URL. Status=403, URL=https://target.helpshift.com/login/ at
org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537)
at
org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at org.jsoup.helper.HttpConnection.post(HttpConnection.java:200) at
edu.utfpr.helpcrawler.JsouptTest.main(JsouptTest.java:32)
如果您检查请求 headers,您会看到它像您所做的那样发送 cookie,但它也在表单数据中包含一部分 cookie。将此添加到您的第二个请求
.data("_csrf_token", loginForm.cookie("_csrf_token"))
在 reading some examples 之后,我想通过登录实现一个 helpshift 爬虫,例如:
https://target.helpshift.com/login/?next=%2Fadmin%2Fissues%2F
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class JsouptTest {
public static void main(String[] args) throws Exception {
int x = 1;
Connection.Response loginForm = Jsoup.connect("https://target.helpshift.com/login/?next=%2Fadmin%2Fissues%2F" + x + "%2F")
.method(Connection.Method.GET)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0")
.execute();
Document document = Jsoup.connect("https://target.helpshift.com/login/")
.data("cookieexists", "false")
.data("username", "email@example.com")
.data("password", "123456")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0")
.cookies(loginForm.cookies())
.post();
System.out.println(document);
}
}
但是,我收到此错误:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=https://target.helpshift.com/login/ at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205) at org.jsoup.helper.HttpConnection.post(HttpConnection.java:200) at edu.utfpr.helpcrawler.JsouptTest.main(JsouptTest.java:32)
如果您检查请求 headers,您会看到它像您所做的那样发送 cookie,但它也在表单数据中包含一部分 cookie。将此添加到您的第二个请求
.data("_csrf_token", loginForm.cookie("_csrf_token"))