Redis哈希函数和数据分区

Redis hash function and data partition

众所周知,Redis 使用 CRC16 算法将键映射到哈希槽。假设 crc 使用某种 "distribution" 来为节点分配密钥是否安全?如果是,什么样的分布?

另外,通过对每个键的哈希函数,我们是否可以确保我们在节点上均匀地加载键的数量?假设客户端在 3 节点集群中随机进行 3000 次插入。之后key会均匀分布在节点中(M1≈1000,M2≈1000,M3≈1000)?

为了测试这些,我在 python 中创建了一个函数:

list1= []
list2= []
list3= []

def RedisClusterCRC16(keysslot):

    XMODEMCRC16Lookup = [
        0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50a5, 0x60c6, 0x70e7,
        0x8108, 0x9129, 0xa14a, 0xb16b, 0xc18c, 0xd1ad, 0xe1ce, 0xf1ef,
        0x1231, 0x0210, 0x3273, 0x2252, 0x52b5, 0x4294, 0x72f7, 0x62d6,
        0x9339, 0x8318, 0xb37b, 0xa35a, 0xd3bd, 0xc39c, 0xf3ff, 0xe3de,
        0x2462, 0x3443, 0x0420, 0x1401, 0x64e6, 0x74c7, 0x44a4, 0x5485,
        0xa56a, 0xb54b, 0x8528, 0x9509, 0xe5ee, 0xf5cf, 0xc5ac, 0xd58d,
        0x3653, 0x2672, 0x1611, 0x0630, 0x76d7, 0x66f6, 0x5695, 0x46b4,
        0xb75b, 0xa77a, 0x9719, 0x8738, 0xf7df, 0xe7fe, 0xd79d, 0xc7bc,
        0x48c4, 0x58e5, 0x6886, 0x78a7, 0x0840, 0x1861, 0x2802, 0x3823,
        0xc9cc, 0xd9ed, 0xe98e, 0xf9af, 0x8948, 0x9969, 0xa90a, 0xb92b,
        0x5af5, 0x4ad4, 0x7ab7, 0x6a96, 0x1a71, 0x0a50, 0x3a33, 0x2a12,
        0xdbfd, 0xcbdc, 0xfbbf, 0xeb9e, 0x9b79, 0x8b58, 0xbb3b, 0xab1a,
        0x6ca6, 0x7c87, 0x4ce4, 0x5cc5, 0x2c22, 0x3c03, 0x0c60, 0x1c41,
        0xedae, 0xfd8f, 0xcdec, 0xddcd, 0xad2a, 0xbd0b, 0x8d68, 0x9d49,
        0x7e97, 0x6eb6, 0x5ed5, 0x4ef4, 0x3e13, 0x2e32, 0x1e51, 0x0e70,
        0xff9f, 0xefbe, 0xdfdd, 0xcffc, 0xbf1b, 0xaf3a, 0x9f59, 0x8f78,
        0x9188, 0x81a9, 0xb1ca, 0xa1eb, 0xd10c, 0xc12d, 0xf14e, 0xe16f,
        0x1080, 0x00a1, 0x30c2, 0x20e3, 0x5004, 0x4025, 0x7046, 0x6067,
        0x83b9, 0x9398, 0xa3fb, 0xb3da, 0xc33d, 0xd31c, 0xe37f, 0xf35e,
        0x02b1, 0x1290, 0x22f3, 0x32d2, 0x4235, 0x5214, 0x6277, 0x7256,
        0xb5ea, 0xa5cb, 0x95a8, 0x8589, 0xf56e, 0xe54f, 0xd52c, 0xc50d,
        0x34e2, 0x24c3, 0x14a0, 0x0481, 0x7466, 0x6447, 0x5424, 0x4405,
        0xa7db, 0xb7fa, 0x8799, 0x97b8, 0xe75f, 0xf77e, 0xc71d, 0xd73c,
        0x26d3, 0x36f2, 0x0691, 0x16b0, 0x6657, 0x7676, 0x4615, 0x5634,
        0xd94c, 0xc96d, 0xf90e, 0xe92f, 0x99c8, 0x89e9, 0xb98a, 0xa9ab,
        0x5844, 0x4865, 0x7806, 0x6827, 0x18c0, 0x08e1, 0x3882, 0x28a3,
        0xcb7d, 0xdb5c, 0xeb3f, 0xfb1e, 0x8bf9, 0x9bd8, 0xabbb, 0xbb9a,
        0x4a75, 0x5a54, 0x6a37, 0x7a16, 0x0af1, 0x1ad0, 0x2ab3, 0x3a92,
        0xfd2e, 0xed0f, 0xdd6c, 0xcd4d, 0xbdaa, 0xad8b, 0x9de8, 0x8dc9,
        0x7c26, 0x6c07, 0x5c64, 0x4c45, 0x3ca2, 0x2c83, 0x1ce0, 0x0cc1,
        0xef1f, 0xff3e, 0xcf5d, 0xdf7c, 0xaf9b, 0xbfba, 0x8fd9, 0x9ff8,
         0x6e17, 0x7e36, 0x4e55, 0x5e74, 0x2e93, 0x3eb2, 0x0ed1, 0x1ef0
    ]

    crc = 0
    for byte in keysslot.encode( "utf-8" ):
        crc = ((crc << 8) & 0xff00) ^ XMODEMCRC16Lookup[((crc >> 8) &  0xff) ^ ord( byte )]





    metr1=0
    metr2=0
    metr3=0

    if ((crc & 0xffff)% 16384) <= 5460:
        metr1 = metr1+1
        list1.append(metr1)
    elif  (((crc & 0xffff)% 16384) > 5460) and (((crc & 0xffff)% 16384) <= 10922):
        metr2 = metr2+1
        list2.append(metr2)
    else:
        metr3 = metr3+1
        list3.append(metr3)





for i in range(2000000):
    RedisClusterCRC16(str(i))


print "M1 holds: ", sum(list1)
print "M2 holds: ", sum(list2)
print "M3 holds: ", sum(list3)

输入 2000000 结果为:

M1 holds:  666625
M2 holds:  666744
M3 holds:  666631

我观察到每个节点(本例中为伪节点)上的时隙分布几乎相等。

经过搜索我找到了证据:

哈希函数将键映射到小整数(桶)。一个理想的哈希函数以random-like的方式将键映射到整数,这样即使输入数据中存在规律性,桶值也会均匀分布。

这个过程可以分为两步:

  • 将键映射到一个整数。
  • 将整数映射到桶。

另外,这里:https://en.wikipedia.org/wiki/Hash_function,解释了 Uniformity 并说 "This method (crc) may produce a sufficiently uniform distribution of hash values, as long as the hash range size n is small compared to the range of the checksum or fingerprint function"。

考虑到这些以及上面的执行代码,很明显 crc 可以产生均匀分布的哈希值。

As it is known, Redis uses the CRC16 algorithm to map keys to hash slots. Is it safe to assume that crc uses some kind of "distribution" in order to assign keys to nodes?

不!它不分发密钥。

根据您使用的客户端库类型,键的数据类型可能不同(例如字符串、字节数组等),但都应该有通用的实现来支持 Redis 集群,那就是任何给定的密钥都经过 CRC16 算法生成哈希,然后取 16384 的 mod 来确定密钥槽。

HASH_SLOT = CRC16(key) mod 16384

它仅用于确定密钥槽,可能没有任何逻辑来根据可用节点数分配密钥。

参考 - https://redis.io/topics/cluster-spec