快速下载 HTML C# 源代码

Question

我正在尝试使用 C# 从单个网站 (https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/) 下载 HTML 源代码。

问题是下载 30kb HTML 页面源需要 10 秒。互联网连接不是问题，因为我可以在这个程序中立即下载 10Mb 文件。

以下在单独的线程和主线程中都执行过。下载仍然需要10-12秒。

1)

using (var httpClient = new HttpClient())
    {
        using (var request = new HttpRequestMessage(new HttpMethod("GET"), url))
        {
            var response = await httpClient.SendAsync(request);
        }
    }

2)

using (var client = new System.Net.WebClient())
    {
        client.Proxy = null;
        response = client.DownloadString(url);
    }

3)

using (var client = new System.Net.WebClient())
    {
        webClient.Proxy = GlobalProxySelection.GetEmptyWebProxy();
        response = client.DownloadString(url);
    }

4)

WebRequest.DefaultWebProxy = null;

using (var client = new System.Net.WebClient())
    {
        response = client.DownloadString(url);
    }

5)

var client = new WebClient()
response = client.DownloadString(url);

6)

var client = new WebClient()
client.DownloadFile(url, filepath);

7)

System.Net.WebClient myWebClient = new System.Net.WebClient();
WebProxy myProxy = new WebProxy();
myProxy.IsBypassed(new Uri(url));
myWebClient.Proxy = myProxy;
response = myWebClient.DownloadString(url);

8)

using var client = new HttpClient();
var content = await client.GetStringAsync(url);

9)

HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();

我想要一种在 C# 中执行此操作的更快方法。

非常感谢您提供任何信息或帮助。

Answer 1

这个问题难倒了我问过的每个人。我找到了一个我会坚持使用的解决方案。

此解决方案平均可在 0.5 秒内完成我需要它完成的工作。据我所知，这仅适用于 windows。如果用户没有“CURL”，我会恢复并转到需要 10 秒才能获得所需内容的旧方法。

该解决方案在临时目录中创建批处理文件，调用该批处理文件对网站进行“CURL”，然后将 CURL 的结果输出到临时目录中的 .txt 文件。

private static void CreateBatchFile() 
    {
        string filePath = $"{tempPath}\tempBat.bat";
        string writeMe = "cd \"%temp%\ProgramTempDir\"\n" +
            "curl \"https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/\">FAA_NASR.txt";
        File.WriteAllText(filePath, writeMe);
    }

private static void ExecuteCommand()
    {
        int ExitCode;
        ProcessStartInfo ProcessInfo;
        Process Process;
        ProcessInfo = new ProcessStartInfo("cmd.exe", "/c " + $"{tempPath}\tempBat.bat");
        ProcessInfo.CreateNoWindow = true;
        ProcessInfo.UseShellExecute = false;
        Process = Process.Start(ProcessInfo);
        Process.WaitForExit();
        ExitCode = Process.ExitCode;
        Process.Close();
    }


private static void GetResponse()
    {
        string response;
        
        string url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";

        CreateBatchFile();
        
        ExecuteCommand();

        if (File.Exists($"{tempPath}\FAA_NASR.txt")  && File.ReadAllText($"{tempPath}\FAA_NASR.txt").Length > 10)
        {
            response = File.ReadAllText($"{tempPath}\FAA_NASR.txt");
        }
        else
        {
            // If we get here the user does not have Curl, OR Curl returned a file that is not longer than 10 Characters.
            using (var client = new System.Net.WebClient())
            {
                client.Proxy = null;
                response = client.DownloadString(url);
            }
        }
    }

Answer 2

我知道这是过时的，但我想我找到了原因：我在其他站点遇到过这个问题。如果您查看响应 cookie，您会发现一个名为 ak_bmsc 的。该 cookie 显示该站点是运行 Akamai Bot Manager。它提供机器人保护，从而阻止 'look' 可疑的请求。

为了得到房东的快速响应，您需要正确的请求设置。在这种情况下：

Headers:
- Host：（他们的主机数据）www.faa.gov
- Accept：（类似于：）*/*
饼干：
- AkamaiEdge = true

示例：

class Program
    {
        private static readonly HttpClient _client = new HttpClient();
        private static readonly string _url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";

        static async Task Main(string[] args)
        {
            var sw = Stopwatch.StartNew();
            using (var request = new HttpRequestMessage(HttpMethod.Get,_url))
            {
                request.Headers.Add("Host", "www.faa.gov");
                request.Headers.Add("Accept", "*/*");
                request.Headers.Add("Cookie", "AkamaiEdge=true");
                Console.WriteLine(await _client.SendAsync(request));
            }
            Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
        }
    }

我需要 896 毫秒。

顺便说一句，您不应该将 HttpClient 放在 using 块中。我知道它是一次性的，但它不是为处理而设计的。

快速下载 HTML C# 源代码

Quick Download HTML Source in C#

html

c#

webclient

download

httprequest

1)

2)

3)

4)

5)

6)

7)

8)

9)