使用 WebRequest 登录网站时出现远程服务器错误

Remote server error while logging into a website using WebRequest

所以我目前正在尝试使用 WebRequest 在网站上登录我的帐户。 我一直在阅读它,以至于我想用一个例子来通过反复试验来学习。

这是我使用的例子 Login to website, via C#

所以当我尝试执行我的代码时,它 returns 一个未处理的异常和它的这个

System.Net.WebException: 'The remote server returned an error: (404) Not Found.'

我尝试单步执行代码,但我认为它可能正试图 POST 无法到达的地方。 我想先解决这个问题,然后再确认它已成功登录。 为了这个问题,我将用户名和密码更改为虚拟文本。

我在这里做错了什么,解决这个问题的最合乎逻辑的方法是什么? 提前致谢。

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

string formUrl = "https://secure.runescape.com/m=weblogin/login.ws"; // NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag
string formParams = string.Format("login-username={0}&login-password={1}", "myUsername", "password");
string cookieHeader;
WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();

cookieHeader = resp.Headers["Set-cookie"];

当您抓取网站时,您必须确保模仿发生的一切。这包括在表单 POST-ed 之前发送的任何 client-side 状态(Cookie)。由于大多数网站不喜欢被机器人抓取或操纵,因此它们通常对有效负载是什么非常挑剔。您试图控制的网站也是如此。

您错过了三件重要的事情:

  • 您不是从初始 GET 开始的,因此您在 CookieContainer 中有所需的 cookie。
  • 在 post 上,您错过了一个 header(推荐人)和表单中的三个隐藏字段。
  • 表单字段被命名为用户名密码(可以在输入标签的名称属性中看到)。你已经使用了id。

修复这些遗漏将产生以下代码:

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
string useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36";

// capture cookies, this is important!
var cookies = new CookieContainer();

// do a GET first, so you have the initial cookies neeeded
string loginUrl = "https://secure.runescape.com/m=weblogin/loginform.ws?mod=www&ssl=0&dest=community";
// HttpWebRequest
var reqLogin = (HttpWebRequest) WebRequest.Create(loginUrl);
// minimal needed settings
reqLogin.UserAgent = useragent;
reqLogin.CookieContainer = cookies;

reqLogin.Method = "GET";
var loginResp = reqLogin.GetResponse();
//loginResp.Dump(); // LinqPad testing

string formUrl = "https://secure.runescape.com/m=weblogin/login.ws"; // NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag
// in ther html the form has 3 more hidden fields, those are needed as well
string formParams = string.Format("username={0}&password={1}&mod=www&ssl=0&dest=community", "myUsername", "password");
string cookieHeader;
// notice the cast to HttpWebRequest
var req = (HttpWebRequest) WebRequest.Create(formUrl);

// put the earlier cookies back on the request
req.CookieContainer = cookies;

// the Referrer is mandatory, without it a timeout is raised
req.Headers["Referrer"] = "https://secure.runescape.com/m=weblogin/loginform.ws?mod=www&ssl=0&dest=community";
req.UserAgent = useragent;

req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();

cookieHeader = resp.Headers["Set-cookie"];

这returns对我来说是成功的。由您解析结果 HTML 以计划您的后续步骤。