mysql 按 varchar 分区 - 随机行为？

Question

我正在尝试熟悉 MySQL Cluster Community Server（版本：5.6.27-ndb-7.4.8-cluster-gpl），但第一个问题让我很困惑。我搜索了文档和论坛，但没有找到相关内容。

我有一个非常简单的 table 在一个有 4 个数据的集群上 nodes/partitions:

CREATE TABLE customer (   
  id int(10) NOT NULL ,   
  surname varchar(35) NOT NULL,   
  gender varchar(6) NOT null,   
  primary key(id, surname, gender)   
)ENGINE=NDBCLUSTER DEFAULT CHARSET=latin1 PARTITION by key (gender);

因此，我选择按性别进行分区（它采用值 Male/Female）。我插入 1000 行，我想看看它们是如何分布的：

SELECT partition_name, table_rows   
FROM information_schema.PARTITIONS   
WHERE table_name='customer';

结果：

partition_name、table_rows

'p0', '0'
'p1', '1000'
'p2', '0'
'p3', '0'

所以所有行都进入一个分区。

但是，如果我将性别定义为 nvarchar(6) 或 varchar(40)，则行会像我期望的那样分布在两个分区中

partition_name、table_rows

'p0', '493'
'p1', '0'
'p2', '507'
'p3', '0'

如果我将性别提高到 varchar(60)，所有记录都会进入一个分区。如果我将它提高到 varchar(100)，记录将均匀分布在两个分区之间。

这背后有什么逻辑吗，还是我做错了什么？

Answer 1

doc says,

Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. This internal hashing function is based on the same algorithm as PASSWORD().

MySQL 是运行你的两个值（Male 和 Female）通过一个任意的，对你来说实现者不可预测的散列函数。在某些情况下，哈希函数会为这两个值产生相同的输出，而在另一些情况下，它会产生不同的值。因此，有时您的所有行最终都在一个特定的分区中，有时它们最终在两个分区中。

像您选择的列那样，不同值的数量很少，不是（Obvious Man 说）散列或键分区的好选择。 Range partitioning可能更合适。

mysql 按 varchar 分区 - 随机行为？

mysql partition by varchar - random behavior?

mysql

database-partitioning

partition_name、table_rows

partition_name、table_rows