MongoDB 副本集问题

MongoDB ReplicaSet issues

我们是 运行 MongoDB Kubernetes 上的 ReplicaSet。 CrashLoop 中的 MongoDB pods 之一,它显示 OOMKilled 为真。从那以后 pod 已经崩溃了 234 次。

我们有一个小学和两个副学。

这是最新的日志。容器存活一分钟左右,然后再次崩溃。我正在尝试了解日志的含义。

OplogStartMissing 是什么意思?

145 {"log":"2022-03-08T09:24:44.127+0000 I REPL     [rsBackgroundSync] Starting rollback due to     OplogStartMissing: Our last op time fetched: { ts: Timestamp(1646656464, 1), t: 58 }. source    's GTE: { ts: Timestamp(1646656801, 1), t: 60 } hashes: (2206456552855381608/810867260034420    2316)\n","stream":"stdout","time":"2022-03-08T09:24:44.12744806Z"}
147 {"log":"2022-03-08T09:24:44.127+0000 I REPL[rsBackgroundSync] Rollback using the 'rollbackViaRefetch' method because UUID support is feature compatible with featureCompatibilityVersion 3.6.\n","stream":"stdout","time":"2022-03-08T09:24:44.12747365Z"}
148 {"log":"2022-03-08T09:24:44.127+0000 I REPL[rsBackgroundSync] transition to ROLLBACK from SECONDARY\n","stream":"stdout","time":"2022-03-08T09:24:44.127477084Z"}
149 {"log":"2022-03-08T09:24:44.127+0000 I ROLLBACK [rsBackgroundSync] Starting rollback. Sync source: mongodb-2.mongodb.maglev-system.svc.cluster.local:27017\n","stream":"stdout","time":"    2022-03-08T09:24:44.127480067Z"}
150 {"log":"2022-03-08T09:24:44.133+0000 I ROLLBACK [rsBackgroundSync] Finding the Common Point\n","stream":"stdout","time":"2022-03-08T09:24:44.133319869Z"}
151 {"log":"2022-03-08T09:24:44.136+0000 I ROLLBACK [rsBackgroundSync] our last optime:   Timest    amp(1646656464, 1)\n","stream":"stdout","time":"2022-03-08T09:24:44.136901468Z"}
152 {"log":"2022-03-08T09:24:44.136+0000 I ROLLBACK [rsBackgroundSync] their last optime: Timestamp(1646731479, 1)\n","stream":"stdout","time":"2022-03-08T09:24:44.136912166Z"}
153 {"log":"2022-03-08T09:24:44.136+0000 I ROLLBACK [rsBackgroundSync] diff in end of log times: **-75015** seconds\n","stream":"stdout","time":"2022-03-08T09:24:44.136916265Z"}
154 {"log":"2022-03-08T09:24:44.320+0000 I NETWORK  [listener] connection accepted from 127.0.0.    1:41476 #2 (1 connection now open)\n","stream":"stdout","time":"2022-03-08T09:24:44.32070222    4Z"}

特别是日志时间末尾的差异是负数。 负值是什么意思。 RollBackViaRefetch 是什么意思?

OOMKilled - 表示容器被终止,因为它试图使用比您在 resources.limits 部分中分配给它的内存更多的内存。

OplogStartMissing - 大多数时候似乎是因为您的 OpLog 太小了。尝试增加它。

RollbackViaRefetch - 来自 documentation:

Nodes go into rollback if after they receive the first batch of writes from their sync source, they realize that the greater than or equal to predicate did not return the last op in their oplog. When rolling back, nodes are in the ROLLBACK state and reads are prohibited. When a node goes into rollback it drops all snapshots. The rolling-back node first finds the common point between its oplog and its sync source's oplog. It then goes through all of the operations in its oplog back to the common point and figures out how to undo them.