Aerospike:三个节点之一突然关闭并且没有写入

Aerospike: One of Three Nodes went down abruptly and writes are not happening

我们是 运行 3 节点集群,AWS 上 4.2.0.4 CE 版本内存中的数据。我们最近注意到写入没有发生并发现了一个失败。理想情况下应该发生写入。一旦我们启动了关闭的节点,写入就会恢复。我们正在从 AWS 外部访问 Aerospike 集群。

在两个节点上连续打印 INFO 日志下方。

INFO (hb): (hb.c:4319) found redundant connections to same node, fds 101 31 - choosing at random

在另一个节点上,没有打印日志,也没有 read/writes 发生在 asadm 统计信息上。 我们还观察到记录在节点之间分布不均匀。

以下是所有服务器一致的配置文件网络部分。

所有 3 个服务器的网络节都是一致的。请在下面找到。

network {
    service {
            address any
            port 3000
    }

    heartbeat {

            mode mesh
            port 3002 # Heartbeat port for this node.

            # List one or more other nodes, one ip-address & port per line:
            mesh-seed-address-port 13.xxx.xxx.xxx 3002
            mesh-seed-address-port 13.xxx.xxx.xxx 3002
            mesh-seed-address-port 13.xxx.xxx.xxx 3002

            interval 150
            timeout 10
    }

    fabric {
            port 3001
    }

    info {
            port 3003
    }
}
namespace smpa {
    replication-factor 2
    memory-size 12G
    storage-engine memory
    single-bin true
    high-water-memory-pct 80
    stop-writes-pct 90
}

$ asadm -e "show stat like stop_writes"

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics (2019-01-24 12:24:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                              :   node5.domain.com:3000   node6.domain.com:3000   node7.domain.com:3000   
cluster_clock_skew_stop_writes_sec:   0                               0                               0                               

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-24 12:24:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                  :   node5.domain.com:3000   node6.domain.com:3000   node7.domain.com:3000   
clock_skew_stop_writes:   false                           false                           false                           
stop_writes           :   false                           false                           false                           

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2019-01-24 12:24:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                  :   node5.domain.com:3000   node6.domain.com:3000   node7.domain.com:3000   
clock_skew_stop_writes:   false                           false                           false                           
stop_writes           :   false                           false                           false   

$ asadm -e "show stat like x_partitions"

Seed:        [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-24 12:30:01 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                           :   node5.domain.com:3000   node6.domain.com:3000   node7.domain.com:3000   
migrate_rx_partitions_active   :   0                               0                               0                               
migrate_rx_partitions_initial  :   0                               2749                            0                               
migrate_rx_partitions_remaining:   0                               0                               0                               
migrate_tx_partitions_active   :   0                               0                               0                               
migrate_tx_partitions_imbalance:   0                               0                               0                               
migrate_tx_partitions_initial  :   1396                            0                               1353                            
migrate_tx_partitions_remaining:   0                               0                               0

$ asadm -e "show pmap"

Seed:        [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Partition Map Analysis (2019-01-24 12:33:39 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     Cluster   Namespace                            Node      Primary    Secondary         Dead   Unavailable   
         Key           .                               .   Partitions   Partitions   Partitions    Partitions   
BEF4A1479187   smpa        node6.domain.com:3000         1382         1367            0             0   
BEF4A1479187   smpa        node7.domain.com:3000         1358         1342            0             0   
BEF4A1479187   smpa        node5.domain.com:3000         1356         1387            0             0   
BEF4A1479187   test        node6.domain.com:3000         1382            0            0             0   
BEF4A1479187   test        node7.domain.com:3000         1358            0            0             0   
BEF4A1479187   test        node5.domain.com:3000         1356            0            0             0   
Number of rows: 6

$ asadm -e "show stat like objects"

Seed:        [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics (2019-01-24 12:34:09 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                       :   node5.domain.com:3000   node6.domain.com:3000   node7.domain.com:3000   
objects                    :   6478039                         6485049                         9265180                         
sindex_gc_objects_validated:   0                               0                               0                               

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-24 12:34:09 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                 :   node5.domain.com:3000   node6.domain.com:3000   node7.domain.com:3000   
evicted_objects      :   0                               0                               0                               
expired_objects      :   0                               0                               0                               
master_objects       :   2944752                         3456686                         4712696                         
non_expirable_objects:   2943325                         3455765                         4711880                         
non_replica_objects  :   0                               0                               0                               
objects              :   6478039                         6485049                         9265180                         
prole_objects        :   3533287                         3028363                         4552484                         

$ asadm -e "info"

Seed:        [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-01-25 06:54:14 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                    Node               Node                    Ip       Build   Cluster   Migrations        Cluster     Cluster         Principal   Client     Uptime   
                                                       .                 Id                     .           .      Size            .            Key   Integrity                 .    Conns          .   
ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   BB9BE0093E32B0A    xx.xxx.xxx.xxx:3000   C-4.2.0.4         3      0.000     3ADA511969DD   True        BB9EAC87115AD0A       59   01:09:24   
ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   *BB9EAC87115AD0A   xx.xxx.xxx.xxx:3000   C-4.2.0.4         3      0.000     3ADA511969DD   True        BB9EAC87115AD0A       59   01:05:17   
ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   BB9D4175485B10A    xx.xxx.xxx.xxx:3000   C-4.2.0.4         3      0.000     3ADA511969DD   True        BB9EAC87115AD0A       59   01:14:17   
Number of rows: 3

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2019-01-25 06:54:14 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace                                                       Node     Total   Expirations,Evictions     Stop       Disk    Disk     HWM   Avail%        Mem     Mem    HWM      Stop   
        .                                                          .   Records                       .   Writes       Used   Used%   Disk%        .       Used   Used%   Mem%   Writes%   
smpa        ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   2.716 M   (0.000,  0.000)         false         N/E   N/E     50      N/E      2.774 GB   24      80     90        
smpa        ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   2.648 M   (0.000,  0.000)         false         N/E   N/E     50      N/E      2.706 GB   23      80     90        
smpa        ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   2.709 M   (0.000,  0.000)         false         N/E   N/E     50      N/E      2.767 GB   24      80     90        
smpa                                                                   8.074 M   (0.000,  0.000)                  0.000 B                             8.247 GB                            
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-01-25 06:54:14 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace                                                       Node     Total     Repl                       Objects                   Tombstones             Pending   Rack   
        .                                                          .   Records   Factor    (Master,Prole,Non-Replica)   (Master,Prole,Non-Replica)            Migrates     ID   
        .                                                          .         .        .                             .                            .             (tx,rx)      .   
smpa        ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   2.716 M   2        (1.375 M, 1.341 M, 0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)     0      
smpa        ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   2.648 M   2        (1.311 M, 1.337 M, 0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)     0      
smpa        ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000   2.709 M   2        (1.351 M, 1.359 M, 0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)     0      
smpa                                                                   8.074 M            (4.037 M, 4.037 M, 0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)            

$ asadm -e "show stat like objects"

Seed:        [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190122 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE   :   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   
objects:   672400                                                     662491                                                     671131                                                     

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190121 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE   :   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   
objects:   376064                                                     347232                                                     374700                                                     

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190124 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE   :   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   
objects:   629323                                                     617983                                                     628214                                                     

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190123 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE   :   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   
objects:   739556                                                     726447                                                     736871                                                     

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190125 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE   :   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   
objects:   313800                                                     308814                                                     313320                                                     

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                       :   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   
objects                    :   2731143                                                    2662967                                                    2724236                                                    
sindex_gc_objects_validated:   0                                                          0                                                          0                                                          

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                 :   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000   
evicted_objects      :   0                                                          0                                                          0                                                          
expired_objects      :   0                                                          0                                                          0                                                          
master_objects       :   1382413                                                    1318579                                                    1358181                                                    
non_expirable_objects:   1382525                                                    1318691                                                    1358445                                                    
non_replica_objects  :   0                                                          0                                                          0                                                          
objects              :   2731143                                                    2662967                                                    2724236                                                    
prole_objects        :   1348730                                                    1344388                                                    1366055                                                    

检查其他两个节点是否正在发布一个客户端无法访问的私有 ip 地址,并且只有一个节点(已关闭)正在发布一个可访问的 ip 地址。 (网络节,服务子上下文)

问题是,我提供了用于心跳通信的NATed ips。理想情况下,我们需要为 "mesh-seed-address-port" 提供私有 IP,如果您的客户端在网络之外,则提供 "access-address" 到 NATed IP。如果需要,请仔细阅读以上主题。

这里有关于如何在 AWS EC2 实例上配置的清晰文档。 https://discuss.aerospike.com/t/aws-ec2-ip-addressing-for-aerospike/2424

非常感谢 kporter、pgupta 和 ashish-shinde 提供的宝贵帮助。