在 mongo 停止副本集,主副本进入恢复状态
Stop replica set on mongo and primary goes into recovery status
当我停止副本集的节点并再次启动它们时,主节点进入状态 "recovering"。
我创建了一个副本集,运行未经授权。为了使用授权,我添加了用户"db.createUser(...)",并在配置文件中启用了授权:
security:
authorization: "enabled"
在停止副本集之前(甚至在不添加安全参数的情况下重启集群),rs.status() 显示:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T09:57:50.335Z"),
"myState" : 1,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 301,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"electionTime" : Timestamp(1473328390, 1),
"electionDate" : ISODate("2016-09-08T09:53:10Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.679Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.676Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.167:27017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.680Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.054Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.168:27017",
"configVersion" : 1
}
],
"ok" : 1
}
为了开始使用这个配置,我停止了每个节点如下:
[root@n--- etc]# mongo --port 27017 --eval 'db.adminCommand("shutdown")'
MongoDB shell version: 3.2.9
connecting to: 127.0.0.1:27017/test
2016-09-02T14:26:15.784+0200 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2016-09-02T14:26:15.785+0200 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:231:14
此次关闭后,我通过检查 ps -ax | grep mongo
的输出确认该进程不存在。
但是当我再次启动节点并使用我的凭据登录时,rs.status() 现在指示:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T13:19:12.963Z"),
"myState" : 3,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 42,
"optime" : {
"ts" : Timestamp(1473340490, 6),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T13:14:50Z"),
"infoMessage" : "could not find member to sync from",
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.553Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.552Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
}
],
"ok" : 1
}
为什么?也许关闭不是停止 mongod 的好方法;但是我也使用 'kill pid' 进行了测试,但重启最终处于相同状态。
在这种情况下,我不知道如何修复集群;我又开始了(删除 dbpath 文件并重新配置副本集);我试过“--repair”但没用。
关于我的系统的信息:
- Mongo版本:3.2
- 我以 root 身份启动进程,也许它应该以 'mongod' 用户身份启动?
- 这是我的启动命令:
mongod --conf /etc/mongod.conf
- keyFile 配置不起作用;如果我添加“--keyFile /path/to/file”显示:
“即将分叉子进程,等待服务器准备好连接。”此文件具有所有权限,但不能使用 keyFile。
"net.bindIp" 配置示例,来自一台机器上的 mongod.conf:
net:
port: 27017
bindIp: 127.0.0.1,192.168.1.167
Note: This solution is Windows specific but can be ported to *nix based systems easily.
您需要按顺序执行步骤。首先,启动您的 mongod 个实例。
start "29001" mongod --dbpath "C:\data\db\r1" --port 29001
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003
用mongo连接到每个节点并创建一个管理员用户。我更喜欢创建超级用户。
> use admin
> db.createUser({user: "root", pwd: "123456", roles:["root"]})
您可以根据需要创建其他用户。
创建密钥文件。有关有效的密钥文件内容,请参阅文档。
Note: On *nix based systems, set chmod of key file to 400
就我而言,我将密钥文件创建为
echo mysecret==key > C:\data\key\key.txt
现在重新启动 MongoDB 服务器并启用 --keyFile
和 --replSet
标志。
start "29001" mongod --dbpath "C:\data\db\r1" --port 29001 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003 --replSet "rs1" --keyFile C:\data\key\key.txt
一旦所有 mongod
个实例都已启动并且 运行,请使用身份验证连接任何一个。
mongo --port 29001 -u "root" -p "123456" --authenticationDatabase "admin"
启动副本集,
> use admin
> rs.initiate()
> rs1:PRIMARY> rs.add("localhost:29002")
{ "ok" : 1 }
> rs1:PRIMARY> rs.add("localhost:29003")
{ "ok" : 1 }
Note: You may need to replace localhost
with machine name or IP address.
节点应一次关闭一个,以便其他次要成员将选举主要成员。它将在恢复节点中同步到另一个成员。这样一一关闭就不需要重新添加节点了。
最后我解决了这个问题,因为集群副本集是强制性的一个密钥文件来通信所有节点,当我指示密钥文件时它 returns 错误,因为在 mongod.log 中指示:
I ACCESS [main] permissions on /etc/keyfile are too open
keyfile 必须有 400 作为权限。谢谢@Saleem
当人们说 "You can add keyfile" 时,我认为这是一个可选参数,但它是强制性的。
当我停止副本集的节点并再次启动它们时,主节点进入状态 "recovering"。
我创建了一个副本集,运行未经授权。为了使用授权,我添加了用户"db.createUser(...)",并在配置文件中启用了授权:
security:
authorization: "enabled"
在停止副本集之前(甚至在不添加安全参数的情况下重启集群),rs.status() 显示:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T09:57:50.335Z"),
"myState" : 1,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 301,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"electionTime" : Timestamp(1473328390, 1),
"electionDate" : ISODate("2016-09-08T09:53:10Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.679Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.676Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.167:27017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.680Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.054Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.168:27017",
"configVersion" : 1
}
],
"ok" : 1
}
为了开始使用这个配置,我停止了每个节点如下:
[root@n--- etc]# mongo --port 27017 --eval 'db.adminCommand("shutdown")'
MongoDB shell version: 3.2.9
connecting to: 127.0.0.1:27017/test
2016-09-02T14:26:15.784+0200 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2016-09-02T14:26:15.785+0200 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:231:14
此次关闭后,我通过检查 ps -ax | grep mongo
的输出确认该进程不存在。
但是当我再次启动节点并使用我的凭据登录时,rs.status() 现在指示:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T13:19:12.963Z"),
"myState" : 3,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 42,
"optime" : {
"ts" : Timestamp(1473340490, 6),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T13:14:50Z"),
"infoMessage" : "could not find member to sync from",
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.553Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.552Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
}
],
"ok" : 1
}
为什么?也许关闭不是停止 mongod 的好方法;但是我也使用 'kill pid' 进行了测试,但重启最终处于相同状态。
在这种情况下,我不知道如何修复集群;我又开始了(删除 dbpath 文件并重新配置副本集);我试过“--repair”但没用。
关于我的系统的信息:
- Mongo版本:3.2
- 我以 root 身份启动进程,也许它应该以 'mongod' 用户身份启动?
- 这是我的启动命令:
mongod --conf /etc/mongod.conf
- keyFile 配置不起作用;如果我添加“--keyFile /path/to/file”显示:
“即将分叉子进程,等待服务器准备好连接。”此文件具有所有权限,但不能使用 keyFile。 "net.bindIp" 配置示例,来自一台机器上的 mongod.conf:
net: port: 27017 bindIp: 127.0.0.1,192.168.1.167
Note: This solution is Windows specific but can be ported to *nix based systems easily.
您需要按顺序执行步骤。首先,启动您的 mongod 个实例。
start "29001" mongod --dbpath "C:\data\db\r1" --port 29001
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003
用mongo连接到每个节点并创建一个管理员用户。我更喜欢创建超级用户。
> use admin
> db.createUser({user: "root", pwd: "123456", roles:["root"]})
您可以根据需要创建其他用户。
创建密钥文件。有关有效的密钥文件内容,请参阅文档。
Note: On *nix based systems, set chmod of key file to 400
就我而言,我将密钥文件创建为
echo mysecret==key > C:\data\key\key.txt
现在重新启动 MongoDB 服务器并启用 --keyFile
和 --replSet
标志。
start "29001" mongod --dbpath "C:\data\db\r1" --port 29001 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003 --replSet "rs1" --keyFile C:\data\key\key.txt
一旦所有 mongod
个实例都已启动并且 运行,请使用身份验证连接任何一个。
mongo --port 29001 -u "root" -p "123456" --authenticationDatabase "admin"
启动副本集,
> use admin
> rs.initiate()
> rs1:PRIMARY> rs.add("localhost:29002")
{ "ok" : 1 }
> rs1:PRIMARY> rs.add("localhost:29003")
{ "ok" : 1 }
Note: You may need to replace
localhost
with machine name or IP address.
节点应一次关闭一个,以便其他次要成员将选举主要成员。它将在恢复节点中同步到另一个成员。这样一一关闭就不需要重新添加节点了。
最后我解决了这个问题,因为集群副本集是强制性的一个密钥文件来通信所有节点,当我指示密钥文件时它 returns 错误,因为在 mongod.log 中指示:
I ACCESS [main] permissions on /etc/keyfile are too open
keyfile 必须有 400 作为权限。谢谢@Saleem
当人们说 "You can add keyfile" 时,我认为这是一个可选参数,但它是强制性的。