Azure DocumentDB 偶尔抛出 SocketException / GoneException

Azure DocumentDB sporadically throws SocketException / GoneException

更新(2/8/17)

请参阅下面的答案。

更新(2/7/17)

我发现重启可以让我成功地 运行 来自 Visual Studio 2015 年的 Web 应用程序,并多次点击端点。但是,当我停止应用程序并重新启动它时,它很可能会失败。然后它会反复失败,直到我重新启动计算机。重启 VS'15 还不够好。

一旦它开始失败,运行从 VS Code 或使用 dotnet.exe 从命令行启动应用程序会表现出相同的行为。

原版POST:

我们设置了一个微服务系统,可以从一对 APIs 和 Azure Functions 调用 DocumentDB 集合。它 间歇性地 失败,SocketExceptionGoneException 嵌套在 API 一侧。据我们所知,考虑到它偶尔会起作用,调用它的代码大部分是正确的。 Azure Functions 可以正常工作。

编辑澄清:"intermittently," 我的意思是它可能一天短暂地工作一次或两次,然后在当天剩下的时间里进入失败状态,没有呼叫通过。这不像每 100 次调用中就有 1 次失败。这更像是在 1 或 2 次成功调用后不断失败。

我能够通过编写一个简单的控制台应用程序从 DocumentDB 读取并将结果打印到调试输出来重新创建相同的异常。这 运行s 一两次没有任何问题,然后每次都开始抛出以下异常。它有时会这样做几个小时,然后再允许几个调用通过,然后再次抛出。

虽然下面的测试器很原始,但主要 API 充分利用了 vNext 项目结构。它使用单例 DocumentClient 进行连接(通过本机 DI 注入),并且从控制器到调用数据库的服务层几乎是完全异步的。我们使用单独的库来管理对 DocumentDB 的访问(如果集合不存在则创建集合、添加扩展方法、简单的 CRUD 操作等),但如下所示直接调用会产生相同的结果。

我注意到的一件事是,与 net46 版本相比,它在 DocumentDB 客户端 ("Microsoft.Azure.DocumentDB.Core": "1.0.0") 核心版本上的成功率要高得多。由于其他库,我们的 API 需要 4.6。

我可以在多台机器、多个网络、多种连接类型上重新创建它。

问题:为什么会出现这个异常,我们该如何解决?

Azure 信息:

测试class

using System;
using System.Diagnostics;
using System.Threading.Tasks;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
using Newtonsoft.Json;

namespace TestConnection
{
    public class Program
    {
        public static void Main(string[] args)
        {
            try
            {
                using (var client = new DocumentClient(
                    new Uri("https://<our-docdb-name>.documents.azure.com:443/"),
                    "our access key",
                    new ConnectionPolicy
                    {
                        ConnectionMode = ConnectionMode.Direct,
                        ConnectionProtocol = Protocol.Tcp
                    }))
                {
                    var query = client.CreateDocumentQuery(UriFactory.CreateCollectionUri("Imports", "User"),
                        "SELECT * FROM c where c.importId = \"816d8e92-bd08-4705-9989-09a0ece5892a\"");
                    var docQuery = query.AsDocumentQuery();
                    GetResults(docQuery).Wait();
                    Debug.WriteLine("done");
                }
            }
            catch (Exception e)
            {
                Debug.WriteLine(e);
            }
        }

        private static async Task GetResults(IDocumentQuery<dynamic> docQuery)
        {
            Debug.WriteLine("getting");
            var results = await docQuery.ExecuteNextAsync();
            Debug.WriteLine(JsonConvert.SerializeObject(results));
        }
    }
}

project.json

{
    "version": "1.0.0-*",
    "buildOptions": {
        "debugType": "portable",
        "emitEntryPoint": true
    },
    "dependencies": {
        "Microsoft.Azure.DocumentDB": "1.11.3"
    },
    "frameworks": {
        "net46": {}
    }
}

异常

'TestConnection.exe' (CLR v4.0.30319: DefaultDomain): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: DefaultDomain): Loaded 'C:\projects\TestConnection\src\TestConnection\bin\Debug\net46\win7-x64\TestConnection.exe'. Symbols loaded.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\projects\TestConnection\src\TestConnection\bin\Debug\net46\win7-x64\Microsoft.Azure.Documents.Client.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System\v4.0_4.0.0.0__b77a5c561934e089\System.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Configuration\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Configuration.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Net.Http\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Net.Http.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Xml\v4.0_4.0.0.0__b77a5c561934e089\System.Xml.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\projects\TestConnection\src\TestConnection\bin\Debug\net46\win7-x64\Newtonsoft.Json.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
DocDBTrace Information: 0 : DocumentClient with id 1 initialized at endpoint: https://<our-docdb-name>.documents.azure.com/ with ConnectionMode: Direct, connection Protocol: Tcp, and consistency level: null
getting
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\Microsoft.CSharp\v4.0_4.0.0.0__b03f5f7f11d50a3a\Microsoft.CSharp.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Numerics\v4.0_4.0.0.0__b77a5c561934e089\System.Numerics.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Runtime.Serialization\v4.0_4.0.0.0__b77a5c561934e089\System.Runtime.Serialization.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Xml.Linq\v4.0_4.0.0.0__b77a5c561934e089\System.Xml.Linq.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'TestConnection.exe' (CLR v4.0.30319: TestConnection.exe): Loaded 'C:\WINDOWS\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
DocDBTrace Information: 0 : Set WriteEndpoint https://<our-docdb-name>-eastus2.documents.azure.com/ ReadEndpoint https://<our-docdb-name>-eastus2.documents.azure.com/
DocDBTrace Information: 0 : Mapped resourceName dbs/Imports/colls/User to resourceId u81pAO1OFwA=. '00000000-0000-0000-0000-000000000000'
DocDBTrace Information: 0 : Mapped resourceName dbs/Imports/colls/User to resourceId u81pAO1OFwA=. '00000000-0000-0000-0000-000000000000'
The thread 0x3888 has exited with code 0 (0x0).
The thread 0x2c20 has exited with code 0 (0x0).
The thread 0x39fc has exited with code 0 (0x0).
The thread 0x3610 has exited with code 0 (0x0).
The thread 0x3824 has exited with code 0 (0x0).
The thread 0x33d8 has exited with code 0 (0x0).
The thread 0x38d0 has exited with code 0 (0x0).
DocDBTrace Information: 0 : GetOpenConnection failed: RID: dbs/Imports/colls/User, ResourceType Document, Op: (operationType: Query, resourceType: Document), Address: rntbd://bn6prdddc05-docdb-1.documents.azure.com:18817/apps/d54f0cf3-23d7-4050-9810-99d319d441a8/services/d77a45f3-5611-4c1d-a08e-0f3ef60a31d9/partitions/wkjhgkwj-c85a-4b08-b026-6bc8010b1bb5/replicas/131287308072454308s/, Exception: Microsoft.Azure.Documents.GoneException: Message: The requested resource is no longer available at the server.
ActivityId: d71bc76d-1411-414d-a844-9f76a46ebcfd, Request URI: rntbd://bn6prdddc05-docdb-1.documents.azure.com:18817/apps/d54f0cf3-23d7-4050-9810-99d319d441a8/services/d77a45f3-5611-4c1d-a08e-0f3ef60a31d9/partitions/wkjhgkwj-c85a-4b08-b026-6bc8010b1bb5/replicas/131287308072454308s/ ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 13.68.28.135:18817
   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at System.Net.Sockets.TcpClient.EndConnect(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdConnection.<OpenSocket>d__1c.MoveNext()
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Documents.RntbdConnection.<OpenSocket>d__1c.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdConnection.<Open>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdConnectionDispenser.<OpenNewConnection>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.ConnectionPool.<GetOpenConnection>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.ConnectionPoolManager.<GetOpenConnection>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdTransportClient.<InvokeStoreAsync>d__0.MoveNext()
DocDBTrace Information: 0 : Exception Microsoft.Azure.Documents.GoneException: Message: The requested resource is no longer available at the server.
ActivityId: d71bc76d-1411-414d-a844-9f76a46ebcfd, Request URI: rntbd://bn6prdddc05-docdb-1.documents.azure.com:18817/apps/d54f0cf3-23d7-4050-9810-99d319d441a8/services/d77a45f3-5611-4c1d-a08e-0f3ef60a31d9/partitions/wkjhgkwj-c85a-4b08-b026-6bc8010b1bb5/replicas/131287308072454308s/ ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 13.68.28.135:18817
   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at System.Net.Sockets.TcpClient.EndConnect(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdConnection.<OpenSocket>d__1c.MoveNext()
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Documents.RntbdConnection.<OpenSocket>d__1c.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdConnection.<Open>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdConnectionDispenser.<OpenNewConnection>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.ConnectionPool.<GetOpenConnection>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.ConnectionPoolManager.<GetOpenConnection>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdTransportClient.<InvokeStoreAsync>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.StoreReader.<CompleteActivity>d__1f.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.StoreReader.<ReadMultipleReplicasInternalAsync>d__a.MoveNext() is thrown while doing readMany
DocDBTrace Warning: 0 : Received gone exception, will retry, Microsoft.Azure.Documents.GoneException: Message: The requested resource is no longer available at the server.
ActivityId: d71bc76d-1411-414d-a844-9f76a46ebcfd, Request URI: rntbd://bn6prdddc05-docdb-1.documents.azure.com:18817/apps/d54f0cf3-23d7-4050-9810-99d319d441a8/services/d77a45f3-5611-4c1d-a08e-0f3ef60a31d9/partitions/wkjhgkwj-c85a-4b08-b026-6bc8010b1bb5/replicas/131287308072454308s/ ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 13.68.28.135:18817
   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at System.Net.Sockets.TcpClient.EndConnect(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.RntbdConnection.<OpenSocket>d__1c.MoveNext()
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Documents.StoreReader.<ReadMultipleReplicasInternalAsync>d__a.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.StoreReader.<ReadMultipleReplicaAsync>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.ConsistencyReader.<ReadSessionAsync>d__8.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.ReplicatedResourceClient.<InvokeAsync>d__b.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.ReplicatedResourceClient.<>c__DisplayClass1.<<InvokeAsync>b__0>d__3.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.BackoffRetryUtility`1.<>c__DisplayClassf`1.<<ExecuteAsync>b__d>d__11.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Documents.BackoffRetryUtility`1.<ExecuteRetry>d__1b.MoveNext()
The thread 0x36e4 has exited with code 0 (0x0).
The thread 0x3adc has exited with code 0 (0x0).
The thread 0x3a68 has exited with code 0 (0x0).
The thread 0x18b0 has exited with code 0 (0x0).
...
/* The above repeats about 3 more times.*/
...
The program '[14648] TestConnection.exe' has exited with code 0 (0x0).

原来是我们的企业 Bitdefender Endpoint Security 导致了这个问题。我们最初并不怀疑它,因为它非常清楚地记录了 dotnet.exe 已列入白名单,并且只要我们 运行 应用程序就允许在适当的端口上进行通信。我们卸载了它,问题就消失了。我们正在调查它到底阻止了什么导致了这个问题,但至少我们知道有一个临时解决方案并且最初的问题不是 code-related。希望这对某人有所帮助。

与接受的答案类似,几天前我突然开始遇到这个问题(偶尔能够使用 cosmos 客户端连接到 cosmos db,但主要是出现 SSL 错误)。最初我想知道 cosmos 是否已经开始拒绝 TLS 1.0 连接,因为现在 ISP 和服务器主机 "finally" 强制执行 TLS 1.2 并不少见,结果证明有人无意中在堆栈深处的某个地方使用了 TLS 1.0/1。

但实际上 windows 10 默认使用 TLS 1.2,而 .net core 使用 O/S 默认 TLS 设置,所以它不是那个。

原来是 McAfee LiveSafe 导致了问题。

我正在使用 .net core 3.1,Microsoft.Azure.Cosmos 3.7.0-preview2。

我还没有想出如何让 "just" 我的 cosmos 连接列入白名单/无论如何,但是关闭 McAfee LiveSafe 实时扫描和防火墙使问题消失了。