无需硬 cpu 用法即可动态更改 HttpClient 中的代理

Dynamically change proxy in HttpClient without hard cpu usage

我需要创建一个多线程应用程序来发出请求(Post、获取等) 为此我选择了 Httpclient.

默认不支持Socks代理。所以我发现 Sockshandler (https://github.com/extremecodetv/SocksSharp) 可以用来代替基本的 HttpClientHandler。它允许我使用袜子。

但是我有一个问题。我所有的请求都应该通过我从互联网上解析的不同代理发送。但是 httpclient 处理程序不支持动态更改代理。如果我没有有效的代理,我需要重新创建一个 httclient,这没问题,但是如果我有 200 个线程,它需要很多 cpu。那么遇到这种情况该怎么办呢?

第二个问题。我发现这篇文章 (https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/) 谈到将 HttpClient 用作单个实例以提高性能,但在多线程程序中这是不可能的。在这种情况下哪种方式更好?

谢谢帮助

httpclient handler doesn't support changing proxies dynamically.

我不确定这在技术上是否属实。 Proxy 是 read/write 属性 所以我相信你可以改变它(除非这会导致运行时错误......老实说我还没有真正尝试过)。

更新: 我已经试过了,你的断言 在技术上是正确的。在下面的示例中,更新 UseProxy 的行将失败并显示 "System.InvalidOperationException: 'This instance has already started one or more requests. Properties can only be modified before sending the first request.'" Confirmed on .NET Core and full framework.

var hch = new HttpClientHandler { UseProxy = false };
var hc = new HttpClient(hch);
var resp = await hc.GetAsync(someUri);

hch.UseProxy = true; // fail!
hch.Proxy = new WebProxy(someProxy);
resp = await hc.GetAsync(someUri);

但事实是,您不能以线程安全的方式为每个请求设置不同的 属性 ,这很不幸。

if I have 200 threads, it takes a lot of cpu

并发的异步 HTTP 调用不应消耗额外的线程,也不 CPU。使用 await Task.WhenAll 或类似的方式将它们关闭,并消耗 there is no thread 直到返回响应。

And second problem. I found this article...

这绝对是您需要注意的事情。但是,即使您可以 为每个请求设置不同的代理,底层网络堆栈仍然需要为每个代理打开一个套接字,因此您不会通过 HttpClient 就套接字耗尽问题而言,每个代理的实例数。

最佳解决方案取决于您在这里谈论的代理数量。在文章中,作者描述了 运行 当服务器打开大约 4000-5000 个套接字时出现问题,而在 400 或更少时没有问题。 YMMV,但是如果代理的数量不超过几百个,您应该安全地为每个代理创建一个新的 HttpClient 实例。如果更多,我会查看 throttling your concurrency 并对其进行测试,直到找到一个您的服务器资源可以跟上的数字。在任何情况下,请确保如果您需要对同一个代理进行多次调用,您正在为它们重新使用 HttpClient 个实例。 ConcurrentDictionary 可用于延迟创建和重用这些实例。

我同意 Todd Menier's answer. But if you use .Net core I suggest to read this and this 篇文章,其中 Microsoft 说:

Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. That issue will result in SocketException errors.

很难过,但他们提供了解决方案:

To address those mentioned issues and make the management of HttpClient instances easier, .NET Core 2.1 introduced a new HttpClientFactory that can also be used to implement resilient HTTP calls by integrating Polly with it.

我查看了 IHttpClientFactory 摘要块并看到:

Each call to System.Net.Http.IHttpClientFactory.CreateClient(System.String) is guaranteed to return a new System.Net.Http.HttpClient instance. Callers may cache the returned System.Net.Http.HttpClient instance indefinitely or surround its use in a using block to dispose it when desired. The default System.Net.Http.IHttpClientFactory implementation may cache the underlying System.Net.Http.HttpMessageHandler instances to improve performance. Callers are also free to mutate the returned System.Net.Http.HttpClient instance's public properties as desired.

来看图吧

IHttpClientFactory 实现注入某些服务(CatalogueService 或任何你做的)然后 HttpClient 每次当你需要发出请求时通过 IHttpClientFactory 实例化(你甚至可以包装它进入 using(...) 块),但 HttpMessageHandler 将被缓存在某种连接池中。

因此您可以使用 HttpClientFactory 创建任意数量的 HttpClient 实例,并在调用前设置代理。 如果对您有帮助,我会很高兴。

更新: 我试过了,它实际上不是你需要的。 您可以像这样实现自己的 IHttpClientFactory

public class Program
{
    public interface IHttpClientFactory
    {
        HttpClient CreateClientWithProxy(IWebProxy webProxy);
    }

    internal class HttpClientFactory : IHttpClientFactory
    {
        private readonly Func<HttpClientHandler> makeHandler;

        public HttpClientFactory(Func<HttpClientHandler> makeHandler)
        {
            this.makeHandler = makeHandler;
        }

        public HttpClient CreateClientWithProxy(IWebProxy webProxy)
        {
            var handler = this.makeHandler();
            handler.Proxy = webProxy;
            return new HttpClient(handler, true);
        }
    }

    internal class CachedHttpClientFactory : IHttpClientFactory
    {
        private readonly IHttpClientFactory httpClientFactory;
        private readonly Dictionary<int, HttpClient> cache = new Dictionary<int, HttpClient>();

        public CachedHttpClientFactory(IHttpClientFactory httpClientFactory)
        {
            this.httpClientFactory = httpClientFactory;
        }

        public HttpClient CreateClientWithProxy(IWebProxy webProxy)
        {
            var key = webProxy.GetHashCode();
            lock (this.cache)
            {
                if (this.cache.ContainsKey(key))
                {
                    return this.cache[key];
                }

                var result = this.httpClientFactory.CreateClientWithProxy(webProxy);
                this.cache.Add(key, result);
                return result;
            }
        }
    }

    public static void Main(string[] args)
    {
        var httpClientFactory = new HttpClientFactory(() => new HttpClientHandler
        {
            UseCookies = true,
            UseDefaultCredentials = true,
        });

        var cachedhttpClientFactory = new CachedHttpClientFactory(httpClientFactory);
        var proxies = new[] {
            new WebProxy()
            {
                Address = new Uri("https://contoso.com"),
            },
            new WebProxy()
            {
                Address = new Uri("https://microsoft.com"),
            },
        };

        foreach (var item in proxies)
        {
            var client = cachedhttpClientFactory.CreateClientWithProxy(item);
            client.GetAsync("http://someAddress.com");
        }
    }
}

但是要小心大量的 WebProxy 集合,它们会占用池中的所有连接。

通过一些测试,我确认您可以通过 WebProxyAddress 属性 更改代理。诀窍是你必须在切换到另一个代理之前发起一个 http 请求。这是示例代码:

    private static async Task CommonHttpClient(List<string> proxyList)
    {
        var webproxy = new WebProxy("http://8.8.8.8:8080", false);
        var handler = new HttpClientHandler()
        {
            Proxy = webproxy,
            UseProxy = true,
        };
        var client = new HttpClient(handler) {Timeout = NetworkUtils.AcceptableTimeoutTimeSpan};
        var data = new Dictionary<Task<HttpResponseMessage>, string>();
        foreach (var proxy in proxyList)
        {
            webproxy.Address = new Uri($"http://{proxy}");
            var uri = new Uri(
                "https://api.ipify.org");
            data.Add(client.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead), proxy);
        }

        while (data.Count > 0)
        {
            var taskFinished = await Task.WhenAny(data.Keys).ConfigureAwait(false);
            var address = data[taskFinished];
            using var resp = await taskFinished.ConfigureAwait(false);
            resp.EnsureSuccessStatusCode();
            var ip = await resp.Content.ReadAsStringAsync().ConfigureAwait(false);
            Assert.Equals(address, ip);
            data.Remove(taskFinished);
        }

        handler.Dispose();
        client.Dispose();
    }
    private static async Task SeperateHttpClient(List<string> proxyList)
    {
        await Task.WhenAll(proxyList.Select(async proxy =>
        {
            var webproxy = new WebProxy($"http://{proxy}", false);
            using var handler = new HttpClientHandler()
            {
                Proxy = webproxy,
                UseProxy = true,
            };
            using var client = new HttpClient(handler) {Timeout = NetworkUtils.AcceptableTimeoutTimeSpan};
            var uri = new Uri("https://api.ipify.org");
            var resp = await client.GetAsync(uri).ConfigureAwait(false);
            resp.EnsureSuccessStatusCode();
            var ip = await resp.Content.ReadAsStringAsync().ConfigureAwait(false);
            Assert.Equals(proxy, ip);

        })).ConfigureAwait(false);
    }

    private static async Task TestAsync1()
    {
        // Your list of proxy
        var proxyList = new List<string>() {"1.2.3.4", "5.6.7.8"};
        
        var start = DateTimeOffset.UtcNow;
        await SeperateHttpClient(proxyList).ConfigureAwait(false);
        Console.WriteLine(start.TotalSecondsSince());

        start = DateTimeOffset.UtcNow;
        await CommonHttpClient(proxyList).ConfigureAwait(false);
        Console.WriteLine(start.TotalSecondsSince());
        
    }

在我的测试过程中,我没有发现共享一个 HttpClient 实例可以提高性能。它甚至需要更长的时间才能完成,即使它有更优化的代码(即使用 ResponseHeaderRead (https://www.stevejgordon.co.uk/using-httpcompletionoption-responseheadersread-to-improve-httpclient-performance-dotnet))

HttpClientHandler 的 Proxy 属性 采用一个实现 IWebProxy 的对象。 IWebProxy 接口有一个方法 GetProxy,它 return 代理的 Uri。因此,您可以创建自己的 class 来实现此接口并控制它如何 return 使用 GetProxy 作为代理的 Uri。你可以让它包装另一个 IWebProxy,在 GetProxy 中它将 return 内部 IWebProxy 的 GetProxy。这样,您就不必更改 HttpClientHandler 的 Proxy 属性,只需更改内部 IWebProxy 即可。我的这个解决方案的实现可以在这里找到:https://github.com/M-Boukhlouf/WebProxyService