LogicalOperationStack 是否与.Net 4.5 中的异步不兼容

Is LogicalOperationStack incompatible with async in .Net 4.5

Trace.CorrelationManager.LogicalOperationStack 启用嵌套逻辑操作标识符,其中最常见的情况是日志记录 (NDC)。它还应该与 async-await 一起使用吗?

这是一个使用 LogicalFlow 的简单示例,它是我对 LogicalOperationStack:

的简单包装
private static void Main() => OuterOperationAsync().GetAwaiter().GetResult();

private static async Task OuterOperationAsync()
{
    Console.WriteLine(LogicalFlow.CurrentOperationId);
    using (LogicalFlow.StartScope())
    {
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
        await InnerOperationAsync();
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
        await InnerOperationAsync();
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
    }
    Console.WriteLine(LogicalFlow.CurrentOperationId);
}

private static async Task InnerOperationAsync()
{
    using (LogicalFlow.StartScope())
    {
        await Task.Delay(100);
    }
}

LogicalFlow:

public static class LogicalFlow
{
    public static Guid CurrentOperationId =>
        Trace.CorrelationManager.LogicalOperationStack.Count > 0
            ? (Guid) Trace.CorrelationManager.LogicalOperationStack.Peek()
            : Guid.Empty;

    public static IDisposable StartScope()
    {
        Trace.CorrelationManager.StartLogicalOperation();
        return new Stopper();
    }

    private static void StopScope() => 
        Trace.CorrelationManager.StopLogicalOperation();

    private class Stopper : IDisposable
    {
        private bool _isDisposed;
        public void Dispose()
        {
            if (!_isDisposed)
            {
                StopScope();
                _isDisposed = true;
            }
        }
    }
}

输出:

00000000-0000-0000-0000-000000000000
    49985135-1e39-404c-834a-9f12026d9b65
    54674452-e1c5-4b1b-91ed-6bd6ea725b98
    c6ec00fd-bff8-4bde-bf70-e073b6714ae5
54674452-e1c5-4b1b-91ed-6bd6ea725b98

具体值并不重要,但据我了解,外线应显示 Guid.Empty(即 00000000-0000-0000-0000-000000000000),内线应显示相同的 Guid值。

您可能会说 LogicalOperationStack 使用的 Stack 不是线程安全的,这就是输出错误的原因。但是,虽然这在一般情况下是正确的,但在这种情况下 永远不会有超过一个线程同时访问 LogicalOperationStack (调用时等待每个 async 操作并且不使用组合器,例如 Task.WhenAll)

问题是 LogicalOperationStack 存储在具有写时复制行为的 CallContext 中。这意味着只要您没有在 CallContext 中明确设置某些内容(并且当您使用 StartLogicalOperation 添加到现有堆栈时也没有设置)您使用的是父上下文而不是您的自己的。

这可以通过简单地在添加到现有堆栈之前将 anything 设置到 CallContext 中来显示。例如,如果我们将 StartScope 更改为:

public static IDisposable StartScope()
{
    CallContext.LogicalSetData("Bar", "Arnon");
    Trace.CorrelationManager.StartLogicalOperation();
    return new Stopper();
}

输出为:

00000000-0000-0000-0000-000000000000
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
00000000-0000-0000-0000-000000000000

注意:我并不是建议任何人实际这样做。真正实用的解决方案是使用 ImmutableStack 而不是 LogicalOperationStack,因为它是线程安全的,而且当你调用 Pop 时它是不可变的,你会得到一个新的 ImmutableStack然后你需要设置回CallContext。一个完整的实现可以作为这个问题的答案:Tracking c#/.NET tasks flow

那么,LogicalOperationStack 应该与 async 一起工作吗?这只是一个错误? LogicalOperationStack 不适合 async 世界吗?还是我遗漏了什么?


Update:使用 Task.Delay 显然令人困惑,因为它使用 System.Threading.Timercaptures the ExecutionContext internally。使用 await Task.Yield(); 而不是 await Task.Delay(100); 使示例更容易理解。

如果您仍然对此感兴趣,我认为这是它们流动方式中的错误 LogicalOperationStack,我认为报告它是个好主意。

他们对 LogicalOperationStack 的堆栈 here in LogicalCallContext.Clone 进行了特殊处理,方法是进行深拷贝(与通过 CallContext.LogicalSetData/LogicalGetData 存储的其他数据不同,后者仅执行浅拷贝).

每次调用 ExecutionContext.CreateCopyExecutionContext.CreateMutableCopy 时都会调用此 LogicalCallContext.Clone 以流向 ExecutionContext

根据您的代码,我做了一个小实验,为 LogicalCallContext 中的 "System.Diagnostics.Trace.CorrelationManagerSlot" 插槽提供了我自己的可变堆栈,以查看它实际被克隆的时间和次数。

代码:

using System;
using System.Collections;
using System.Diagnostics;
using System.Linq;
using System.Runtime.Remoting.Messaging;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        static readonly string CorrelationManagerSlot = "System.Diagnostics.Trace.CorrelationManagerSlot";

        public static void ShowCorrelationManagerStack(object where)
        {
            object top = "null";
            var stack = (MyStack)CallContext.LogicalGetData(CorrelationManagerSlot);
            if (stack.Count > 0)
                top = stack.Peek();

            Console.WriteLine("{0}: MyStack Id={1}, Count={2}, on thread {3}, top: {4}",
                where, stack.Id, stack.Count, Environment.CurrentManagedThreadId, top);
        }

        private static void Main()
        {
            CallContext.LogicalSetData(CorrelationManagerSlot, new MyStack());

            OuterOperationAsync().Wait();
            Console.ReadLine();
        }

        private static async Task OuterOperationAsync()
        {
            ShowCorrelationManagerStack(1.1);

            using (LogicalFlow.StartScope())
            {
                ShowCorrelationManagerStack(1.2);
                Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
                await InnerOperationAsync();
                ShowCorrelationManagerStack(1.3);
                Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
                await InnerOperationAsync();
                ShowCorrelationManagerStack(1.4);
                Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
            }

            ShowCorrelationManagerStack(1.5);
        }

        private static async Task InnerOperationAsync()
        {
            ShowCorrelationManagerStack(2.1);
            using (LogicalFlow.StartScope())
            {
                ShowCorrelationManagerStack(2.2);
                await Task.Delay(100);
                ShowCorrelationManagerStack(2.3);
            }
            ShowCorrelationManagerStack(2.4);
        }
    }

    public class MyStack : Stack, ICloneable
    {
        public static int s_Id = 0;

        public int Id { get; private set; }

        object ICloneable.Clone()
        {
            var cloneId = Interlocked.Increment(ref s_Id); ;
            Console.WriteLine("Cloning MyStack Id={0} into {1} on thread {2}", this.Id, cloneId, Environment.CurrentManagedThreadId);

            var clone = new MyStack();
            clone.Id = cloneId;

            foreach (var item in this.ToArray().Reverse())
                clone.Push(item);

            return clone;
        }
    }

    public static class LogicalFlow
    {
        public static Guid CurrentOperationId
        {
            get
            {
                return Trace.CorrelationManager.LogicalOperationStack.Count > 0
                    ? (Guid)Trace.CorrelationManager.LogicalOperationStack.Peek()
                    : Guid.Empty;
            }
        }

        public static IDisposable StartScope()
        {
            Program.ShowCorrelationManagerStack("Before StartLogicalOperation");
            Trace.CorrelationManager.StartLogicalOperation();
            Program.ShowCorrelationManagerStack("After StartLogicalOperation");
            return new Stopper();
        }

        private static void StopScope()
        {
            Program.ShowCorrelationManagerStack("Before StopLogicalOperation");
            Trace.CorrelationManager.StopLogicalOperation();
            Program.ShowCorrelationManagerStack("After StopLogicalOperation");
        }

        private class Stopper : IDisposable
        {
            private bool _isDisposed;
            public void Dispose()
            {
                if (!_isDisposed)
                {
                    StopScope();
                    _isDisposed = true;
                }
            }
        }
    }
}

结果很意外。即使此异步工作流中只涉及两个线程,堆栈也会被克隆多达 4 次。问题是,匹配的 Stack.PushStack.Pop 操作(由 StartLogicalOperation/StopLogicalOperation 调用)对堆栈的不同的、不匹配的克隆进行操作,从而使"logical" 堆栈。这就是错误所在。

这确实使得 LogicalOperationStack 在异步调用中完全无法使用,即使没有并发的任务分支。

已更新,我还对同步调用的行为进行了一些研究,以解决 :

Agreed, not a dupe. Did you check if it works as expected on the same thread, e.g. if you replace await Task.Delay(100) with Task.Delay(100).Wait()? – Noseratio Feb 27 at 21:00

@Noseratio yes. It works of course, because there's only a single thread (and so a single CallContext). It's as if the method wasn't async to begin with. – i3arnon Feb 27 at 21:01

单线程不等于单线程CallContext。即使对于同一单线程上的同步延续,执行上下文(及其内部 LogicalCallContext)也可以被克隆。例如,使用上面的代码:

private static void Main()
{
    CallContext.LogicalSetData(CorrelationManagerSlot, new MyStack());

    ShowCorrelationManagerStack(0.1);

    CallContext.LogicalSetData("slot1", "value1");
    Console.WriteLine(CallContext.LogicalGetData("slot1"));

    Task.FromResult(0).ContinueWith(t =>
        {
            ShowCorrelationManagerStack(0.2);

            CallContext.LogicalSetData("slot1", "value2");
            Console.WriteLine(CallContext.LogicalGetData("slot1"));
        }, 
        CancellationToken.None,
        TaskContinuationOptions.ExecuteSynchronously,
        TaskScheduler.Default);

    ShowCorrelationManagerStack(0.3);
    Console.WriteLine(CallContext.LogicalGetData("slot1"));

    // ...
}

输出(注意我们如何丢失 "value2"):

0.1: MyStack Id=0, Count=0, on thread 9, top:
value1
Cloning MyStack Id=0 into 1 on thread 9
0.2: MyStack Id=1, Count=0, on thread 9, top:
value2
0.3: MyStack Id=0, Count=0, on thread 9, top:
value1

是的,LogicalOperationStack应该async-await一起工作,它是一个错误,但它没有.

我联系了微软的相关开发人员,他的回复是这样的:

"I wasn't aware of this, but it does seem broken. The copy-on-write logic is supposed to behave exactly as if we'd really created a copy of the ExecutionContext on entry into the method. However, copying the ExecutionContext would have created a deep copy of the CorrelationManager context, as it's special-cased in CallContext.Clone(). We don't take that into account in the copy-on-write logic."

此外,他建议使用 .Net 4.6 中添加的新 System.Threading.AsyncLocal<T> class 而不是应该正确处理该问题。

所以,我继续使用 VS2015 RC 和 .Net 4.6 在 AsyncLocal 而不是 LogicalOperationStack 之上实现 LogicalFlow

public static class LogicalFlow
{
    private static AsyncLocal<Stack> _asyncLogicalOperationStack = new AsyncLocal<Stack>();

    private static Stack AsyncLogicalOperationStack
    {
        get
        {
            if (_asyncLogicalOperationStack.Value == null)
            {
                _asyncLogicalOperationStack.Value = new Stack();
            }

            return _asyncLogicalOperationStack.Value;
        }
    }

    public static Guid CurrentOperationId =>
        AsyncLogicalOperationStack.Count > 0
            ? (Guid)AsyncLogicalOperationStack.Peek()
            : Guid.Empty;

    public static IDisposable StartScope()
    {
        AsyncLogicalOperationStack.Push(Guid.NewGuid());
        return new Stopper();
    }

    private static void StopScope() =>
        AsyncLogicalOperationStack.Pop();
}

并且同一测试的输出确实应该是:

00000000-0000-0000-0000-000000000000
    ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
    ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
    ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
00000000-0000-0000-0000-000000000000

此处和网络上提到的解决方案之一是在上下文中调用 LogicalSetData:

CallContext.LogicalSetData("one", null);
Trace.CorrelationManager.StartLogicalOperation();

但实际上,读取当前执行上下文就足够了:

var context = Thread.CurrentThread.ExecutionContext;
Trace.CorrelationManager.StartLogicalOperation();