使用登录网站凭据下载 html

Download html with login web site credentials

我已成功从网站下载 html 字符串,但我想登录然后下载 html。

我的代码

 Dim client As New HttpClient
 Dim html = Await client.GetStringAsync("http://www.betbrain.com/football/norway/tippeligaen/sandefjord-v-tromso-il/")

我怎样才能先用我的用户名登录并从代码中通过然后下载网站? (因为html登录和未登录数据不同)

Here is some c# (.NET is all the same) code about handling cookies. If you login that way, keep the cookieContainer to use it in GET request. EDIT: We need to first go to their homepage and grab the cookies, so the first request, and then we can do whatever we want, so the login is working :)

    var baseAddress = new Uri("http://www.betbrain.com");
    var cookieContainer = new CookieContainer();
    
    using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer, AllowAutoRedirect = false })
    using (var client = new HttpClient(handler) { BaseAddress = baseAddress })
    {
        client.DefaultRequestHeaders.Clear();
        client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36");
        var result = client.GetAsync("/").Result;
        Console.WriteLine(result.StatusCode);
    }

    Console.WriteLine("Cookies count after: " + cookieContainer.Count);
    using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer, AllowAutoRedirect = false })
    using (var client = new HttpClient(handler) { BaseAddress = baseAddress })
    {
        client.DefaultRequestHeaders.Clear();
        client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36");
        var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("username", "bar"),
            new KeyValuePair<string, string>("password", "bazinga"),
            new KeyValuePair<string, string>("rememberSignIn", "0")
            });
        var result = client.PostAsync("/sign-in", content).Result;
        if (result.StatusCode == HttpStatusCode.TemporaryRedirect)
        {
            Console.WriteLine("Invalid user/login");
        }
        else if (result.StatusCode == HttpStatusCode.Found)
        {
            Console.WriteLine("Yay its working");
        }
    }

After looking at the /sign-in/ page HTMLs i found out, it is a simple form, with username and password fields so above method should work.

Then Get your data in similiar way, but with the SAME cookieContainer:

using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
using (var client = new HttpClient(handler){ BaseAddress = baseAddress })
{
    var result = client.GetStringAsync("/football/norway/tippeligaen/sandefjord-v-tromso-il/").Result;
    result.EnsureSuccessStatusCode();
}

Heres the screenshot of response with VALID login & password

EDIT: Ok, i did more tests and that request is working

POST /sign-in/ HTTP/1.1
Host: www.betbrain.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded

password=mySuperPassword&username=Toumash&rememberSignIn=0

Is giving

HTTP/1.1 302 Found
Server: nginx/1.5.11
Date: Mon, 29 Jun 2015 10:12:32 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 0
Connection: keep-alive
P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
Cache-Control: no-cache
...Cookies
Location: http://www.betbrain.com/

So: you need to disable redirections and everything should be good

EDIT: everything is working, but they are playing with cookies. It is the only difference from browser request and my code rq. Request without cookies: response 307 -> with: 302 (working)