空闲后通过 Mongo API 连接到 CosmosDB 失败

Connection to CosmosDB through Mongo API fails after idle

我们有一个 Scala 服务器,它使用由 Casbah 包装的 Java MongoDB 驱动程序。最近,我们使用 Mongo API 将其数据库从实际的 MongoDB 切换到 Azure CosmosDB。这通常工作正常,但是偶尔对 Cosmos 的调用会失败并出现 MongoSocketWriteException(下面的堆栈跟踪)。

我们将客户端创建为

import com.mongodb.casbah.Imports._

val mongoUrl = "mongodb://username:password@host.documents.azure.com:10255/?ssl=true&replicaSet=globaldb"

val client = MongoClient(MongoClientURI(mongoUrl))
val collection: MongoCollection = client("mongoDatabase")("mongoCollection")

我们尝试按照针对这个看似相似的错误 (How to solve MongoError: pool destroyed while connecting to CosmosDB) 的建议解决方法从连接 URI 中删除 &replicaSet=globaldb,但它没有解决问题。

堆栈跟踪:

com.mongodb.MongoSocketWriteException: Exception sending message
    at com.mongodb.connection.InternalStreamConnection.translateWriteException(InternalStreamConnection.java:462)
    at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:205)
    at com.mongodb.connection.UsageTrackingInternalConnection.sendMessage(UsageTrackingInternalConnection.java:95)
    at com.mongodb.connection.DefaultConnectionPool$PooledConnection.sendMessage(DefaultConnectionPool.java:424)
    at com.mongodb.connection.CommandProtocol.sendMessage(CommandProtocol.java:209)
    at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:111)
    at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
    at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286)
    at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:173)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:215)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:206)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:112)
    at com.mongodb.operation.CountOperation.call(CountOperation.java:210)
    at com.mongodb.operation.CountOperation.call(CountOperation.java:206)
    at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:230)
    at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:203)
    at com.mongodb.operation.CountOperation.execute(CountOperation.java:206)
    at com.mongodb.operation.CountOperation.execute(CountOperation.java:53)
    at com.mongodb.Mongo.execute(Mongo.java:772)
    at com.mongodb.Mongo.execute(Mongo.java:759)
    at com.mongodb.DBCollection.getCount(DBCollection.java:962)
    at com.mongodb.DBCursor.count(DBCursor.java:670)
    at com.mongodb.casbah.MongoCollectionBase.getCount(MongoCollection.scala:496)
    at com.mongodb.casbah.MongoCollectionBase.getCount$(MongoCollection.scala:488)
    at com.mongodb.casbah.MongoCollection.getCount(MongoCollection.scala:1106)
    at com.mongodb.casbah.MongoCollectionBase.count(MongoCollection.scala:897)
    at com.mongodb.casbah.MongoCollectionBase.count$(MongoCollection.scala:894)
    at com.mongodb.casbah.MongoCollection.count(MongoCollection.scala:1106)
    [snip]
Caused by: java.net.SocketException: Broken pipe (Write failed)
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
    at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
    at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
    at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)
    at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)
    at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
    at com.mongodb.connection.SocketStream.write(SocketStream.java:75)
    at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:201)
    ... 38 common frames omitted

(发布此答案是因为我希望该解决方案对其他人有用,并且欢迎任何进一步的见解。)

在我们将 &maxIdleTimeMS=1500000 添加到连接 URI 以将最大连接空闲时间设置为 25 分钟后,问题消失了。

原因似乎是 Azure 服务器上的空闲连接超时 30 分钟,而 Mongo 客户端的默认行为根本没有空闲超时。服务器不会将它正在丢弃空闲连接的事实传达给客户端,因此下一次尝试使用它会失败并出现上述错误。将最大连接空闲时间设置为小于 30 分钟的值会使我们的服务器在 Azure 服务器终止空闲连接之前关闭它们。在使用连接之前进行某种保持活动或检查也可能是可能的。

我实际上还没有找到关于这个问题的任何文档或其他对 CosmosDB 问题的引用,尽管它可能是由 Azure 内部负载均衡器的 TCP 连接的 30 分钟空闲超时引起或与之相关的(参见 https://feedback.azure.com/forums/217313-networking/suggestions/18823588-increase-idle-timeout-on-internal-load-balancers-t).

您可以使用

设置时间

var options = new MongoClientOptions.Builder() .socketKeepAlive(true) .heartbeatFrequency(1000) .maxConnectionIdleTime(18000) var clientUri = new MongoClientURI(uri,options)

尝试一次