.NET SDK 中的 Couchbase NodeUnavailableException

Couchbase NodeUnavailableException in .NET SDK

我们在生产代码中经常遇到此异常,但没有增加对 Couchbase 的请求数量或服务器本身的内存压力。 该节点已分配 30GB 的 RAM,最大使用量为 3GB,但时不时会抛出此异常。存储桶在每个应用程序生命周期内仅打开一次,之后仅执行获取和更新插入操作。连接初始化如下:

Config = new ClientConfiguration()
{
    Servers = serverList,

    UseSsl = false,
    DefaultOperationLifespan = 2500,
    BucketConfigs = new Dictionary<string, BucketConfiguration>
    {
        { bucketName, new BucketConfiguration
        {
            BucketName = bucketName,
            UseSsl = false,
            DefaultOperationLifespan = 2500,
            PoolConfiguration = new PoolConfiguration
            {
            MaxSize = 2000,
            MinSize = 200,
            SendTimeout = (int)Configuration.Config.Instance.CouchbaseConfig.Timeout
            }
    }}
    }
};

Cluster = new Cluster(Config);
Bucket = Cluster.OpenBucket();

能否请您告诉我此初始化是否正确,更重要的是要在 Couchbase 服务器上检查什么以找到此问题的原因?我已经检查了服务器上的所有日志,但在抛出这些错误时没有发现任何特殊内容。

谢谢,

堆栈跟踪:

System.Exception.Couchbase exception
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get()
at ###.API.Services.BaseService`1.SetUserID()
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.EventsService.GetResponse()
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.BaseService`1.Any()
at lambda_method()
at ServiceStack.Host.ServiceRunner`1.Execute()
at ServiceStack.Host.ServiceRunner`1.Process()
at ServiceStack.Host.ServiceExec`1.Execute()
at ServiceStack.Host.ServiceRequestExec`2.Execute()
at ServiceStack.Host.ServiceController.ManagedServiceExec()
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f()
at ServiceStack.Host.ServiceController.Execute()
at ServiceStack.HostContext.ExecuteService()
at ServiceStack.Host.RestHandler.ProcessRequestAsync()
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest()
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep()
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps()
at System.Web.HttpApplication.BeginProcessRequestNotification()
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
Caused by: System.Exception : Couchbase.Core.NodeUnavailableException: The node 172.31.34.105:11210 that the key was mapped to is either down or unreachable. The SDK will continue to try to connect every 1000ms. Until it can connect every operation routed to it will fail with this exception.
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get()
at ###.API.Services.BaseService`1.SetUserID()
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.EventsService.GetResponse()
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.BaseService`1.Any()
at lambda_method()
at ServiceStack.Host.ServiceRunner`1.Execute()
at ServiceStack.Host.ServiceRunner`1.Process()
at ServiceStack.Host.ServiceExec`1.Execute()
at ServiceStack.Host.ServiceRequestExec`2.Execute()
at ServiceStack.Host.ServiceController.ManagedServiceExec()
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f()
at ServiceStack.Host.ServiceController.Execute()
at ServiceStack.HostContext.ExecuteService()
at ServiceStack.Host.RestHandler.ProcessRequestAsync()
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest()
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep()
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps()
at System.Web.HttpApplication.BeginProcessRequestNotification()
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()

可以为任何数量的网络相关问题返回 NodeUnavailableException...但是,由于您提到您在 AWS 上 运行,很可能需要在客户端上调整 TCP 保持活动设置.

您的 MinSize 连接 (200) 太大,您不可能全部使用它们,它们一直闲置,直到 AWS LB 决定关闭它们。发生这种情况时,SDK 会暂时将发生故障的节点(1000 毫秒)置于关闭状态,然后尝试重新连接。在此期间,映射到它的任何键都会因该异常而失败。

这篇博客介绍了如何设置TCP keep-alives时间和间隔:http://blog.couchbase.com/introducing-couchbase-.net-sdk-2.1.0-the-asynchronous-couchbase-.net-client

var config = new ClientConfiguration
{
    EnableTcpKeepAlives = true, //default it true
    TcpKeepAliveTime = 1000*60*60, //set to 60mins
    TcpKeepAliveInterval = 5000 //KEEP ALIVE will be sent every 5 seconds  after 1hr
};
var cluster = new Cluster(config);
var bucket = cluster.OpenBucket();

假定您使用的是 2.1.0 或更高版本的客户端。如果你不是,你可以通过 ServicePointManager 来完成:

//setting keep-alive time to 200 seconds
ServicePointManager.SetTcpKeepAlive(true, 200000, 1000); 

您必须将其设置为小于 AWS LB 设置的值(我相信是 60 秒)。

您还应该将连接池的最小值和最大值设置得低一点,例如 5 和 10。

尽管问题没有完全解决,因为我们仍然遇到超时,但速率较低,我们通过使用 ClusterHelper 单例实例来提高性能,如下所示:

 ClusterHelper.Initialize(
            new ClientConfiguration
            {
                Servers = serverList,
                UseSsl = false,
                DefaultOperationLifespan = 2500,
                EnableTcpKeepAlives = true,
                TcpKeepAliveTime = 1000*60*60,
                TcpKeepAliveInterval = 5000,
                BucketConfigs = new Dictionary<string, BucketConfiguration>
                {
                    {
                        "default",
                        new BucketConfiguration
                        {
                            BucketName = "default",
                            UseSsl = false,
                            Password = "",
                            PoolConfiguration = new PoolConfiguration
                            {
                                MaxSize = 50,
                                MinSize = 10
                            }
                        }
                    }
                }
            });