DeviceClient SetMethodDefaultHandlerAsync 未及时调用

DeviceClient SetMethodDefaultHandlerAsync not invoked timely

描述错误

DeviceClient SetMethodDefaultHandlerAsync 处理程序不会在互联网断开时立即触发。它会在 15 到 20 分钟后触发。以下是日志

IoT Hub 连接状态已更改状态:已连接原因:Connection_Ok时间:3:09:15PM +02 IoT 中心连接状态已更改状态:Disconnected_Retrying 原因:Communication_Error 时间:3:26:29 PM +02

我在 3:10:00 下午 +02 断开了互联网连接,16 分钟后出现通信错误。我创建了一个重现问题的示例代码

重现步骤

using Microsoft.Azure.Amqp.Transport;
using Microsoft.Azure.Devices.Client;
using Microsoft.Azure.Devices.Client.Transport.Mqtt;
using Microsoft.Azure.Devices.Shared;
using System;
using System.Threading;
using System.Threading.Tasks;

namespace IOTClientTest
{
    class Program
    {
        private DeviceClient _client;
        static async Task Main(string[] args)
        {
            
            await new Program().RunAsync(collection =>
            {
                return null;
            }, (new CancellationTokenSource()).Token);
            Console.Read();
        }
        
        public async Task RunAsync(Func<TwinCollection, Task<TwinCollection>> twinUpdateHandler, CancellationToken cancellationToken)
        {
            var settings = new ITransportSettings[]
            {
                new MqttTransportSettings(TransportType.Mqtt_Tcp_Only)
                {
                    KeepAliveInSeconds = 10,
                }
            };

            _client = DeviceClient.CreateFromConnectionString(
                "HostName=XXXX;SharedAccessKey=XXXXX",
                "XXXX", settings);
            
            
            var retryPolicy = new ExponentialBackoff(5,
                TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(120),
                TimeSpan.FromSeconds(5));
            _client.SetRetryPolicy(retryPolicy);

            
            _client.SetConnectionStatusChangesHandler((status, reason) =>
            {
                Console.WriteLine($"IoT Hub connection status Changed Status: {status} Reason: {reason} Time: {DateTime.Now.ToString("h:mm:ss tt zz")}");
            });
            
             await _client.OpenAsync();
            

            await _client.SetMethodHandlerAsync("executeShell", async (req, context) =>
            {
                await Task.Delay(0);
                return new MethodResponse(500);
            },null);

            await _client.SetMethodDefaultHandlerAsync(MethodHandler, null);
            
            await _client.SetDesiredPropertyUpdateCallbackAsync(async (collection, context) =>
                {
                    var updated = await twinUpdateHandler(collection);
                    await _client.UpdateReportedPropertiesAsync(updated);
                }
                , null);

            await ReceiveMessagesAsync(cancellationToken);
        }

        private async Task<MethodResponse> MethodHandler(MethodRequest methodRequest, object parameter)
        {
            await Task.Delay(0);
            return new MethodResponse(500);
        }

        private async Task ReceiveMessagesAsync(CancellationToken cancellationToken)
        {
            while (!cancellationToken.IsCancellationRequested)
            {
                var message = await _client.ReceiveAsync(TimeSpan.FromSeconds(10));
                if (message != null)
                {
                    //Do something with received message...
                }
            }
        }
        
    }
}

预期行为

网络断开时,设备客户端应在不超过 10 秒内更改其状态

实际行为

设备客户端在 16 分钟后抛出状态变化

使用的版本

添加以下信息:

运行时环境: OS姓名:佐林 OS版本:15 OS 平台:Linux RID:linux-x64 基本路径:/usr/share/dotnet/sdk/3.1.201/

主持人(对支持有用): 版本:3.1.3 提交:4a9f85e9f8

.NET Core SDK 安装: 2.2.402 [/usr/share/dotnet/sdk] 3.1.201 [/usr/share/dotnet/sdk]

.NET Core 运行时已安装: Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

要安装其他 .NET Core 运行时或 SDK: https://aka.ms/dotnet-download

可在此处获取存储库:https://github.com/raza707mirza/iottest

我也在这里报告了一个错误:https://github.com/Azure/azure-iot-sdk-csharp/issues/1409

根据 github 中的讨论结果:https://github.com/Azure/azure-iot-sdk-csharp/issues/1409

The SDK is relying on the OS TCP stack to inform that the disconnect has happened, and the OS can take a couple of retries before relaying this information. This might be what is causing the connection status change handler to get invoked with a 15min delay on Linux.

For Mqtt, the client does send a ping request every 75 seconds, but does not seem to be monitoring the ping response being received from the broker.

PS:使用 Windows 10 OS

时不会发生这种情况

更新

为 mqtt 层添加了一个修复程序,现在 sdk 会监视 ping 响应并在发送 ping 请求和接收响应之间的延迟大于 30 秒(该值当前不可配置)时断开连接。

对于amqp实现,amqp library封装了我们的ping请求-响应逻辑;设备 sdk 所做的只是设置 IdleTimeout。

此问题已在 Microsoft.Azure.Devices.Client.1.28.0 中修复 https://github.com/Azure/azure-iot-sdk-csharp/issues/1409#issuecomment-676836635

发布:https://github.com/Azure/azure-iot-sdk-csharp/releases/tag/lts_2020-8-19