MongoDB 无法在 OpenShift v3.11 上启动,因为无法读取 WiredTiger.wt,即使可以从终端读取文件

MongDB failed to start on OpenShift v3.11 because of failing to read WiredTiger.wt even though the file can be read from Terminal

我在 OpenShift v3.11 上有一个 MongoDB StatefulSet 运行ning。 PersistentVolume 正在使用 NFSv4。

在我们的环境中,我设置 NFS 服务器中的目录归 nfsnobody:nfsnobody 所有。 SELinux 也已设置为 Permissive。所有内部目录和文件也被授予 chmod ug+rwx,o-rwx.

这样做是为了在 运行 时,当 Pod 使用组 root (gid=0) 的用户访问共享路径时,由于 NFS 默认压缩用户和组 rootnfsnobody,Pod 将能够读取和写入共享路径。

$> ls -halZ /srv/share/openshift/mongo/
drwxrwx---. nfsnobody nfsnobody unconfined_u:object_r:default_t:s0 data

This set up has been working for months. But then it starts to fail.

但是,当我部署 Pod 时,它无法启动并出现以下错误:

021-01-26T16:12:48.163+0000 W STORAGE  [initandlisten] Detected unclean shutdown - /var/lib/mongodb/data/mongod.lock is not empty.
2021-01-26T16:12:48.163+0000 I STORAGE  [initandlisten] Detected data files in /var/lib/mongodb/data created by the 'wiredTiger' storage engine, so setting theactive storage engine to 'wiredTiger'.
2021-01-26T16:12:48.163+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2021-01-26T16:12:48.164+0000 I STORAGE  [initandlisten] wiredtiger_open config:create,cache_size=31220M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2021-01-26T16:12:48.688+0000 E STORAGE  [initandlisten] WiredTiger error (1) [1611677568:688148][457:0x7f9b59cc1ca8], file:WiredTiger.wt, connection: __posix_open_file, 715: /var/lib/mongodb/data/WiredTiger.wt: handle-open: open: Operationnot permitted Raw: [1611677568:688148][457:0x7f9b59cc1ca8], file:WiredTiger.wt,connection: __posix_open_file, 715: /var/lib/mongodb/data/WiredTiger.wt: handle-open: open: Operation not permitted
2021-01-26T16:12:48.708+0000 E STORAGE  [initandlisten] WiredTiger error (1) [1611677568:708810][457:0x7f9b59cc1ca8], file:WiredTiger.wt, connection: __posix_open_file, 715: /var/lib/mongodb/data/WiredTiger.wt: handle-open: open: Operationnot permitted Raw: [1611677568:708810][457:0x7f9b59cc1ca8], file:WiredTiger.wt,connection: __posix_open_file, 715: /var/lib/mongodb/data/WiredTiger.wt: handle-open: open: Operation not permitted
2021-01-26T16:12:48.728+0000 E STORAGE  [initandlisten] WiredTiger error (1) [1611677568:728860][457:0x7f9b59cc1ca8], file:WiredTiger.wt, connection: __posix_open_file, 715: /var/lib/mongodb/data/WiredTiger.wt: handle-open: open: Operationnot permitted Raw: [1611677568:728860][457:0x7f9b59cc1ca8], file:WiredTiger.wt,connection: __posix_open_file, 715: /var/lib/mongodb/data/WiredTiger.wt: handle-open: open: Operation not permitted
2021-01-26T16:12:48.744+0000 W STORAGE  [initandlisten] Failed to start up WiredTiger under any compatibility version.
2021-01-26T16:12:48.744+0000 F STORAGE  [initandlisten] Reason: 1: Operation not permitted
2021-01-26T16:12:48.744+0000 F -        [initandlisten] Fatal Assertion 28595 at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 638
2021-01-26T16:12:48.744+0000 F -        [initandlisten]

乍一看,“可能是 mongod 进程没有读取文件的权限”。但是,当我运行在调试模式下访问终端时,我完全可以访问路径/var/lib/mongo/data

$> id
id=1000230000 gid=0(root) groups=0(root),1000230000
$> cd /var/lib/mongodb/data

/var/lib/mongodb/data$> echo "This is a test" >new_file
/var/lib/mongodb/data$> rm new_file
/var/lib/mongodb/data$> cat WiredTiger.wt | wc -l
23
/var/lib/mongodb/data$> mongod --dbpath $(pwd)
....failed...

以上命令显示我可以读取 /var/lib/mongod/data/WiredTiger.wt 来计算行数,但 mongod 进程不能。

只有我这样做

# 1000230000 is the random UID and GID granted by OpenShift for the Pod.
$> chown -R 1000230000:nfsnobody /srv/share/openshift/mongo/

...Pod 能够读取文件。

我还应该检查什么来解决这个问题吗?

更新:

通过阅读标签 r4.0.5 处的 MongoDB 源代码,我现在可以理解为什么会出现错误。

感谢@Alex Blex提示源代码!

总结

mongod 尝试读取WiredTiger.wt(或任何其他文件)时,它会尝试不更新文件的上次访问时间(inode 中的st_time)。这样做的原因是to increase performance。在幕后,它使用带有标志 O_NOATIME.

的系统调用 open()

根据open() man page

This flag can be employed only if one of the following conditions is true:

  • The effective UID of the process matches the owner UID of the file.

  • The calling process has the CAP_FOWNER capability in its user namespace and the owner UID of the file has a mapping in the namespace.

调用失败并出现错误

EPERM  The O_NOATIME flag was specified, but the effective user
       ID of the caller did not match the owner of the file and
       the caller was not privileged.

在我的例子中,文件归 nfsnobody 所有,而不是当前的 UID,因此出现错误。这只能通过 chown $UID:nfsnobody 来解释,问题就会消失。

更多细节

错误来自 posix/os_fs.c when it tries to open a file. At line 693,如果使用 WT_FS_OPEN_FILE_TYPE_DATA 调用 __posix_open_file,则设置标志 NO_ATIME