Task.WhenAny - 替代 List 避免 O(N²) 问题

Task.WhenAny - alternative to List avoiding O(N²) issues

我一直在努力提高对 async C# 代码的理解和使用,特别是如何将其集成到现有的同步代码中。

我有以下测试程序,它基本上是来自 https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/start-multiple-async-tasks-and-process-them-as-they-complete?pivots=dotnet-6-0 的带有同步调用程序和 LinqPad 可运行包装器的测试。

void Main()
{
    var a = new A();
    
    List<string> urls = new List<string>() 
        {
            "https://docs.microsoft.com/dotnet",
            "https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.whenall?view=net-6.0",
            "
        };
        
    a.GetUrlContentLengths(urls).Dump();
}

public class A
{   
    public int GetUrlContentLengths(IEnumerable<string> urls)
    {
        return Task.Run<int>(async() => await GetUrlContentLengthsAsync(urls)).Result;
    }
    
public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
    System.Net.Http.HttpClient client = new System.Net.Http.HttpClient();

    IEnumerable<Task<int>> downloadTasksQuery = urls.Select(x => ProcessUrlAsync(x, client));

    var downloadTasks = downloadTasksQuery.ToList();

    int total = 0;
    
    while (downloadTasks.Any())
    {
        Task<int> finishedTask = await Task.WhenAny(downloadTasks);
        downloadTasks.Remove(finishedTask);
        total += await finishedTask;
    }
    
    return total;
}
        
    
    public  async Task<int> ProcessUrlAsync(string url, System.Net.Http.HttpClient client)
    {
        byte[] content = await client.GetByteArrayAsync(url);
        Console.WriteLine($"{url,-60} {content.Length,10:#,#}");

        return content.Length;
    }
}

This linked document 描述了这样的 O(n²) 问题:

What we’ve effectively created here is an O(N2) algorithm: for each task, we search the list for the task to remove it, which is an O(N) operation, and we register a continuation with each task, which is also an O(N) operation

那么对 Dictionary 的这个小改动是否可以解决这个问题并将整个事情留作 O(n) 操作?

public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
    {
        System.Net.Http.HttpClient client = new System.Net.Http.HttpClient();

        IEnumerable<Task<int>> downloadTasksQuery = urls.Select(x => ProcessUrlAsync(x, client));

        var downloadTasks = downloadTasksQuery.ToDictionary(xk => xk.GetHashCode(), xv => xv);

        int total = 0;
        
        while (downloadTasks.Any())
        {
            Task<int> finishedTask = await Task.WhenAny(downloadTasks.Values);
            downloadTasks.Remove(finishedTask.GetHashCode());
            total += await finishedTask;
        }
        
        return total;
    }

So would this minor change to a Dictionary fix this and leave the whole thing as an O(n) operation?

没有。搜索一个 List<T> 确实是一个 O(n) 操作,但是消除这个操作并不会消除 所有 O(n) 操作发生在 while 循环内. Task.WhenAny method, which has a far greater impact (overhead) at slowing down your code than searching in the list. The hidden operation is the attaching of continuations on all incomplete tasks in the downloadTasks collection, and then detaching these continuations when any of the tasks completes. That's a lot of work to do, because it involves memory allocations and synchronization overhead, and the only way to avoid it is to avoid using the WhenAny-in-a-loop antipattern. Here is an alternative O(n) implementation of your algorithm. It's O(n) because only one continuation is attached on each task, by the Task.WhenAll 方法中还隐藏了一个复杂度为 O(n) 的操作:

public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
    HttpClient client = new();

    int total = 0;

    Task<int>[] higherOrderTasks = urls.Select(async url =>
    {
        int result = await ProcessUrlAsync(url, client).ConfigureAwait(false);
        Interlocked.Add(ref total, result);
        return result;
    }).ToArray();

    await Task.WhenAll(higherOrderTasks);

    return total;
}

为每个 ProcessUrlAsync 任务创建一个高阶任务,它包装该任务并合并任务完成时应该 运行 的代码。 await ProcessUrlAsync 之后的延续可能 运行 彼此并发,因此您可能必须同步对您可能必须改变的任何共享状态的访问,例如上面的 total 变量例子。除非您确定您的代码将 运行 放在 SynchronizationContext that will synchronize the continuations, in which case you should also remove the .ConfigureAwait(false) 上。在这种特定情况下,实际上可以完全摆脱高阶任务和共享状态,如下所示:

public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
    HttpClient client = new();

    Task<int>[] tasks = urls
        .Select(url => ProcessUrlAsync(url, client))
        .ToArray();

    int[] results = await Task.WhenAll(tasks);

    return results.Sum();
}