使用 Azure 函数链模式时 Activity 之间的延迟时间增加

Question

我在一行中有 3000 个活动运行，如下面的代码

问题是对于前一百个活动，它运行得很快。

对于接下来的一百个活动，它在开始新的activity之前开始延迟（两次活动之间延迟1秒）

最后一百个活动，延迟时间差不多15秒

Azure 持久函数似乎不支持具有大量 activity 的链接。相反，我们应该转向使用扇出模式。但这不符合我的需求。

        [FunctionName("Trigger")]
        public static async Task<HttpResponseMessage> Run(
            [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequestMessage req,
            [DurableClient] IDurableOrchestrationClient starter,
            ILogger log)
        {
            log.LogInformation("C# HTTP trigger function processed a request.");
            string instanceId = await starter.StartNewAsync("Orchestrator", null);
            log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
            return starter.CreateCheckStatusResponse(req, instanceId);
        }

        [FunctionName("Orchestrator")]
        public static async Task<List<string>> RunOrchestrator(
            [OrchestrationTrigger] IDurableOrchestrationContext context,
            ILogger log)
        {
            log.LogInformation($"XXX start Orc");
            var outputs = new List<string>();
            //var tasks = new List<Task<string>>();

            // Run activity in a line
            for (int i = 0; i < 3000; i++)
                outputs.Add(await context.CallActivityAsync<string>("Activity", $"Sinh{i + 1}"));

            //outputs.AddRange(await Task.WhenAll(tasks));
            log.LogInformation($"XXX stop Orc");
            return outputs;
        }

        [FunctionName("Activity")]
        public static string SayHello([ActivityTrigger] string name, ILogger log)
        {
            log.LogInformation($"XXX Saying hello to {name}.");
            return $"Hello {name}!";
        }

非常感谢任何建议

Answer 1

使用多个工作进程：

默认情况下，函数的任何主机实例都使用单个工作进程。要提高性能，请使用 FUNCTIONS_WORKER_PROCESS_COUNT 增加每个主机的工作进程数（最多 10 个）。

参考更多here

编排延迟：

Orchestrations 实例通过将 ExecutionStarted 消息放入任务中心的控制队列之一来启动。在某些情况下，您可能会观察到业务流程被安排到运行和它开始运行ning 之间的多秒延迟。在间隔时间内，编排实例保持Pending状态。这种延迟有两个潜在原因：

积压的控制队列： 实例的控制队列包含大量消息，运行时间接收并处理ExecutionStarted消息可能需要一些时间。当编排同时处理大量事件时，可能会发生消息积压。进入控制队列的事件包括编排启动事件、activity 完成、持久计时器、终止和外部事件。如果这种延迟发生在正常情况下，请考虑创建一个具有更多分区的新任务中心。配置更多的分区将导致运行时间创建更多的负载分配控制队列。每个分区对应1:1个控制队列，最多16个分区。

默认情况下，分区数为四个。如果需要更多分区，您需要使用新的分区计数更新 host.json 中的任务中心配置。主机将在重新启动后检测到此更改。

回避轮询延迟： 描述了编排延迟的另一个常见原因 here 控制队列的回退轮询行为。但是，只有当应用程序扩展到两个或更多实例时才会出现这种延迟。如果只有一个应用程序实例，或者如果启动编排的应用程序实例也是轮询目标控制队列的同一个实例，则不会有队列轮询延迟。如前所述，可以通过更新 host.json 设置来减少后退轮询延迟。

参考orchestration delays

Answer 2

我希望您可以通过在 host.json 中将 extendedSessionsEnabled 设置为 true 来 显着地 提高编排速度。这里有一些文档：https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-perf-and-scale#extended-sessions

Extended sessions is a setting that keeps orchestrations and entities in memory even after they finish processing messages. The typical effect of enabling extended sessions is reduced I/O against the underlying durable store and overall improved throughput.

更多背景知识：每次您首次 await 特定任务时，编排都会从内存中卸载。这意味着您的编排被卸载和重新加载 3000 次。每次加载回内存时，它都需要从 Azure 存储中重新读取其执行历史记录，然后重播协调程序代码以返回到之前的位置。每次重播的成本都会更高，因为它必须遍历更多代码并将更多历史记录行加载到内存中。

扩展会话通过阻止编排卸载其状态来消除上述所有重放行为。这意味着它永远不需要重播，也不需要在每个新 await 重新加载整个编排历史。对于大型 fan-in/fan-outs 和您示例中的大型序列，我绝对推荐它。

使用 Azure 函数链模式时 Activity 之间的延迟时间增加

Delay time between Activity increasing when using Azure function chaining pattern

c#

azure-functions

azure-durable-functions