drbd & Corosync - 我的 drbd 工作，它告诉我它是最新的，但它不是

Question

我有一个具有两个节点的高可用性集群，其中有一个用于 drbd 的资源、一个虚拟 IP 和在 drbd 分区上共享的 mariaDB 文件。

一切似乎都正常，但 drbd 没有同步我创建的最新文件，即使 drbd 状态告诉我它们是最新的。

sudo drbdadm status 
iba role:Primary
  disk:UpToDate

pcs也不显示错误

sudo pcs status 
Cluster name: cluster_iba
Cluster Summary:
  * Stack: corosync
  * Current DC: iba2-ip192 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Feb 22 18:16:20 2022
  * Last change:  Mon Feb 21 16:19:38 2022 by root via cibadmin on iba1-ip192
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ iba1-ip192 iba2-ip192 ]

Full List of Resources:
  * virtual_ip  (ocf::heartbeat:IPaddr2):    Started iba2-ip192
  * Clone Set: DrbdData-clone [DrbdData] (promotable):
    * Masters: [ iba2-ip192 ]
    * Slaves: [ iba1-ip192 ]
  * DrbdFS  (ocf::heartbeat:Filesystem):     Started iba2-ip192
  * WebServer   (ocf::heartbeat:apache):     Started iba2-ip192
  * Maria   (ocf::heartbeat:mysql):  Started iba2-ip192

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

所有约束：

sudo pcs constraint list --full
Location Constraints:
Ordering Constraints:
  promote DrbdData-clone then start DrbdFS (kind:Mandatory) (id:order-DrbdData-clone-DrbdFS-mandatory)
  start DrbdFS then start virtual_ip (kind:Mandatory) (id:order-DrbdFS-virtual_ip-mandatory)
  start virtual_ip then start WebServer (kind:Mandatory) (id:order-virtual_ip-WebServer-mandatory)
  start DrbdFS then start Maria (kind:Mandatory) (id:order-DrbdFS-Maria-mandatory)
Colocation Constraints:
  DrbdFS with DrbdData-clone (score:INFINITY) (with-rsc-role:Master) (id:colocation-DrbdFS-DrbdData-clone-INFINITY)
  virtual_ip with DrbdFS (score:INFINITY) (id:colocation-virtual_ip-DrbdFS-INFINITY)
  WebServer with virtual_ip (score:INFINITY) (id:colocation-WebServer-virtual_ip-INFINITY)
  Maria with DrbdFS (score:INFINITY) (id:colocation-Maria-DrbdFS-INFINITY)
Ticket Constraints:

节点iba2-ip192（当它是主节点时）/mnt/datosDRBD中的文件，

/mnt/datosDRBD$ ls -l
total 80
-rw-r--r-- 1 root  root   5801 feb 21 12:16 drbd_cfg
-rw-r--r-- 1 root  root  10494 feb 21 12:18 fs_cfg
drwx------ 2 root  root  16384 feb 21 10:12 lost+found
drwxr-xr-x 4 mysql mysql  4096 feb 22 18:00 mariaDB
-rw-r--r-- 1 root  root  17942 feb 21 12:39 MariaDB_cfg
-rw-r--r-- 1 root  root      5 feb 21 10:13 testMParicio.txt
-rw-r--r-- 1 root  root  13578 feb 21 12:21 WebServer_cfg

以及节点 iba1-ip192（当它是主节点时）/mnt/datosDRBD 中的文件，

ls -l
total 92
-rw-r--r-- 1 root     root      5801 feb 21 12:16 drbd_cfg
drwxrwxrwx 5 www-data www-data  4096 feb 22 13:41 FilesSGITV
-rw-r--r-- 1 root     root     10494 feb 21 12:18 fs_cfg
drwx------ 2 root     root     16384 feb 21 10:12 lost+found
drwxr-xr-x 7 mysql    mysql     4096 feb 22 17:55 mariaDB
-rw-r--r-- 1 root     root     17942 feb 21 12:39 MariaDB_cfg
-rw-r--r-- 1 root     root         5 feb 22 17:58 testMParicio2.txt
-rw-r--r-- 1 www-data www-data     9 feb 22 17:58 testMParicio3.txt
-rw-r--r-- 1 root     root         5 feb 21 10:13 testMParicio.txt
-rw-r--r-- 1 root     root     13578 feb 21 12:21 WebServer_cfg

所有新文件，testMParicio2.txt testMParicio3.txt 和文件夹 FilesSGITV 都丢失了。

我不知道该怎么办。我很迷茫。

感谢任何帮助，谢谢。

（编辑）

我的 drbd 配置，在两个节点中...

cat /etc/drbd.conf 
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

还有我的 *.res 配置，也在两个节点中：

resource iba {
        device /dev/drbd0;
        disk /dev/md3;
                meta-disk internal;
                on iba1 {
                        address 10.0.0.248:7789;
                }
                on iba2  {
                        address 10.0.0.249:7789;
                }
}

drbdadm 使用 iba1 和 iba2，IP 为 10.0.0.248 和 10.0.0.249

Corosync 使用 iba1-ip192 和 iba2-192，IP 为 192.168.1.248 和 192.168.1.249

cat /etc/hosts
127.0.0.1 localhost
#127.0.1.1 iba1
10.0.0.248  iba1
10.0.0.249  iba2
192.168.1.248 iba1-ip192
192.168.1.249 iba2-ip192

cat /etc/drbd.d/global_common.conf


global {
    usage-count yes;
    
    udev-always-use-vnr; # treat implicit the same as explicit volumes

}

common {
    handlers {
    }

    startup {
    }

    options {
    }

    disk {
    }

    net {
        protocol C;
    }
}

（编辑 2）

我在 /proc/drbd

中发现了一个问题

在主节点中：

cat /proc/drbd 
version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C 
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:2284 dr:11625 al:6 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:42364728

在辅助节点中

cat /proc/drbd 
version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C 
 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:36538580

辅助节点不记得 ssh 密钥，用

修复

ssh-keygen  -R 10.0.0.248
ssh-copy-id iba@iba1

但 drbd 仍处于 StandAlone 状态。
我不知道如何继续

Answer 1

我找到了一个 Split-Brain 没有出现在 pcs 的状态中。

sudo journalctl | grep Split-Brain
feb 21 13:00:10 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
feb 21 13:21:40 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
feb 21 13:27:54 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!

我已经停止了集群，在主服务器上设置了--force，然后... 在 split-brain 受害者上（假设 DRBD 资源是 iba）：

drbdadm disconnect iba
drbdadm secondary iba
drbdadm connect --discard-my-data iba

在 split-brain 个幸存者：

drbdadm primary iba
drbdadm connect iba

drbd & Corosync - 我的 drbd 工作，它告诉我它是最新的，但它不是

drbd & Corosync - My drbd works, it shows me that it is upToDate, but it is not

high-availability

cluster-computing

drbd

pacemaker

corosync