C# Hashtable.cs 不全是素数
C# Hashtable.cs have not all Prime numbers
为什么 Hashtable class 不全是素数?
所以基本素数有那些数字
https://en.wikipedia.org/wiki/Prime_number
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199
但是 C# 库中 Hashtable.cs 中的代码有这个
// Table of prime numbers to use as hash table sizes.
// A typical resize algorithm would pick the smallest prime number in this array
// that is larger than twice the previous capacity.
// Suppose our Hashtable currently has capacity x and enough elements are added
// such that a resize needs to occur. Resizing first computes 2x then finds the
// first prime in the table greater than 2x, i.e. if primes are ordered
// p_1, p_2, ..., p_i, ..., it finds p_n such that p_n-1 < 2x < p_n.
// Doubling is important for preserving the asymptotic complexity of the
// hashtable operations such as add. Having a prime guarantees that double
// hashing does not lead to infinite loops. IE, your hash function will be
// h1(key) + i*h2(key), 0 <= i < size. h2 and the size must be relatively prime.
public static readonly int[] primes = {
3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};
不太明白为什么会这样?
编辑 1:
我的意思是,我不是在谈论为什么它这么小。为什么在这个数组中没有5,或者13。例如:如果我们7*2 = 14,为什么最近的一个是11,而不是13?
如果您考虑所有小于或等于 7199369 的素数,您将得到一个包含超过 44.000 个条目的数组。在这么大的数组中找到 p_n 需要多少时间? (甚至使用二进制搜索)。你要使用多少内存?是否值得优化哈希表的内存使用?我想写这篇文章的人认为 72 种尺寸是一个足够好的权衡。
为了回答明确的问题,他们选择哈希表容量的增长因子 (20%) 并搜索,给定数组中的素数,另一个大约大 20% 的素数。
这些质数用于确定散列的大小table。例如,我们可以将其设为 197 个桶或槽。但是区分 197 和 199 的大小是没有意义的,因为 hash tables 至少比它们的项目数长大约 30%,以避免发生太多键冲突。而且,正如评论所说,“调整大小首先计算 2x,然后在 table 中找到第一个大于 2x 的素数”。因此,更细粒度的步骤没有任何优势。
因此,他们选择了更经济的方式。对于更大的素数,此列表中的两个连续素数相差约 20%。对于更小的素数,甚至还有更大的进步。但这对内存使用的影响很小。
{ 3, 7}, delta % = 133.33
{ 7, 11}, delta % = 57.14
{ 11, 17}, delta % = 54.55
{ 17, 23}, delta % = 35.29
{ 23, 29}, delta % = 26.09
{ 29, 37}, delta % = 27.59
{ 37, 47}, delta % = 27.03
{ 47, 59}, delta % = 25.53
{ 59, 71}, delta % = 20.34
{ 71, 89}, delta % = 25.35
{ 89, 107}, delta % = 20.22
{ 107, 131}, delta % = 22.43
{ 131, 163}, delta % = 24.43
{ 163, 197}, delta % = 20.86
{ 197, 239}, delta % = 21.32
{ 239, 293}, delta % = 22.59
{ 293, 353}, delta % = 20.48
{ 353, 431}, delta % = 22.10
{ 431, 521}, delta % = 20.88
{ 521, 631}, delta % = 21.11
{ 631, 761}, delta % = 20.60
{ 761, 919}, delta % = 20.76
{ 919, 1103}, delta % = 20.02
{ 1103, 1327}, delta % = 20.31
{ 1327, 1597}, delta % = 20.35
{ 1597, 1931}, delta % = 20.91
{ 1931, 2333}, delta % = 20.82
{ 2333, 2801}, delta % = 20.06
{ 2801, 3371}, delta % = 20.35
{ 3371, 4049}, delta % = 20.11
{ 4049, 4861}, delta % = 20.05
{ 4861, 5839}, delta % = 20.12
{ 5839, 7013}, delta % = 20.11
{ 7013, 8419}, delta % = 20.05
{ 8419, 10103}, delta % = 20.00
{ 10103, 12143}, delta % = 20.19
{ 12143, 14591}, delta % = 20.16
{ 14591, 17519}, delta % = 20.07
{ 17519, 21023}, delta % = 20.00
{ 21023, 25229}, delta % = 20.01
{ 25229, 30293}, delta % = 20.07
{ 30293, 36353}, delta % = 20.00
{ 36353, 43627}, delta % = 20.01
{ 43627, 52361}, delta % = 20.02
{ 52361, 62851}, delta % = 20.03
{ 62851, 75431}, delta % = 20.02
{ 75431, 90523}, delta % = 20.01
{ 90523, 108631}, delta % = 20.00
{ 108631, 130363}, delta % = 20.01
{ 130363, 156437}, delta % = 20.00
{ 156437, 187751}, delta % = 20.02
{ 187751, 225307}, delta % = 20.00
{ 225307, 270371}, delta % = 20.00
{ 270371, 324449}, delta % = 20.00
{ 324449, 389357}, delta % = 20.01
{ 389357, 467237}, delta % = 20.00
{ 467237, 560689}, delta % = 20.00
{ 560689, 672827}, delta % = 20.00
{ 672827, 807403}, delta % = 20.00
{ 807403, 968897}, delta % = 20.00
{ 968897, 1162687}, delta % = 20.00
{1162687, 1395263}, delta % = 20.00
{1395263, 1674319}, delta % = 20.00
{1674319, 2009191}, delta % = 20.00
{2009191, 2411033}, delta % = 20.00
{2411033, 2893249}, delta % = 20.00
{2893249, 3471899}, delta % = 20.00
{3471899, 4166287}, delta % = 20.00
{4166287, 4999559}, delta % = 20.00
{4999559, 5999471}, delta % = 20.00
{5999471, 7199369}, delta % = 20.00
为什么 Hashtable class 不全是素数? 所以基本素数有那些数字 https://en.wikipedia.org/wiki/Prime_number
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199
但是 C# 库中 Hashtable.cs 中的代码有这个
// Table of prime numbers to use as hash table sizes.
// A typical resize algorithm would pick the smallest prime number in this array
// that is larger than twice the previous capacity.
// Suppose our Hashtable currently has capacity x and enough elements are added
// such that a resize needs to occur. Resizing first computes 2x then finds the
// first prime in the table greater than 2x, i.e. if primes are ordered
// p_1, p_2, ..., p_i, ..., it finds p_n such that p_n-1 < 2x < p_n.
// Doubling is important for preserving the asymptotic complexity of the
// hashtable operations such as add. Having a prime guarantees that double
// hashing does not lead to infinite loops. IE, your hash function will be
// h1(key) + i*h2(key), 0 <= i < size. h2 and the size must be relatively prime.
public static readonly int[] primes = {
3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};
不太明白为什么会这样?
编辑 1: 我的意思是,我不是在谈论为什么它这么小。为什么在这个数组中没有5,或者13。例如:如果我们7*2 = 14,为什么最近的一个是11,而不是13?
如果您考虑所有小于或等于 7199369 的素数,您将得到一个包含超过 44.000 个条目的数组。在这么大的数组中找到 p_n 需要多少时间? (甚至使用二进制搜索)。你要使用多少内存?是否值得优化哈希表的内存使用?我想写这篇文章的人认为 72 种尺寸是一个足够好的权衡。 为了回答明确的问题,他们选择哈希表容量的增长因子 (20%) 并搜索,给定数组中的素数,另一个大约大 20% 的素数。
这些质数用于确定散列的大小table。例如,我们可以将其设为 197 个桶或槽。但是区分 197 和 199 的大小是没有意义的,因为 hash tables 至少比它们的项目数长大约 30%,以避免发生太多键冲突。而且,正如评论所说,“调整大小首先计算 2x,然后在 table 中找到第一个大于 2x 的素数”。因此,更细粒度的步骤没有任何优势。
因此,他们选择了更经济的方式。对于更大的素数,此列表中的两个连续素数相差约 20%。对于更小的素数,甚至还有更大的进步。但这对内存使用的影响很小。
{ 3, 7}, delta % = 133.33
{ 7, 11}, delta % = 57.14
{ 11, 17}, delta % = 54.55
{ 17, 23}, delta % = 35.29
{ 23, 29}, delta % = 26.09
{ 29, 37}, delta % = 27.59
{ 37, 47}, delta % = 27.03
{ 47, 59}, delta % = 25.53
{ 59, 71}, delta % = 20.34
{ 71, 89}, delta % = 25.35
{ 89, 107}, delta % = 20.22
{ 107, 131}, delta % = 22.43
{ 131, 163}, delta % = 24.43
{ 163, 197}, delta % = 20.86
{ 197, 239}, delta % = 21.32
{ 239, 293}, delta % = 22.59
{ 293, 353}, delta % = 20.48
{ 353, 431}, delta % = 22.10
{ 431, 521}, delta % = 20.88
{ 521, 631}, delta % = 21.11
{ 631, 761}, delta % = 20.60
{ 761, 919}, delta % = 20.76
{ 919, 1103}, delta % = 20.02
{ 1103, 1327}, delta % = 20.31
{ 1327, 1597}, delta % = 20.35
{ 1597, 1931}, delta % = 20.91
{ 1931, 2333}, delta % = 20.82
{ 2333, 2801}, delta % = 20.06
{ 2801, 3371}, delta % = 20.35
{ 3371, 4049}, delta % = 20.11
{ 4049, 4861}, delta % = 20.05
{ 4861, 5839}, delta % = 20.12
{ 5839, 7013}, delta % = 20.11
{ 7013, 8419}, delta % = 20.05
{ 8419, 10103}, delta % = 20.00
{ 10103, 12143}, delta % = 20.19
{ 12143, 14591}, delta % = 20.16
{ 14591, 17519}, delta % = 20.07
{ 17519, 21023}, delta % = 20.00
{ 21023, 25229}, delta % = 20.01
{ 25229, 30293}, delta % = 20.07
{ 30293, 36353}, delta % = 20.00
{ 36353, 43627}, delta % = 20.01
{ 43627, 52361}, delta % = 20.02
{ 52361, 62851}, delta % = 20.03
{ 62851, 75431}, delta % = 20.02
{ 75431, 90523}, delta % = 20.01
{ 90523, 108631}, delta % = 20.00
{ 108631, 130363}, delta % = 20.01
{ 130363, 156437}, delta % = 20.00
{ 156437, 187751}, delta % = 20.02
{ 187751, 225307}, delta % = 20.00
{ 225307, 270371}, delta % = 20.00
{ 270371, 324449}, delta % = 20.00
{ 324449, 389357}, delta % = 20.01
{ 389357, 467237}, delta % = 20.00
{ 467237, 560689}, delta % = 20.00
{ 560689, 672827}, delta % = 20.00
{ 672827, 807403}, delta % = 20.00
{ 807403, 968897}, delta % = 20.00
{ 968897, 1162687}, delta % = 20.00
{1162687, 1395263}, delta % = 20.00
{1395263, 1674319}, delta % = 20.00
{1674319, 2009191}, delta % = 20.00
{2009191, 2411033}, delta % = 20.00
{2411033, 2893249}, delta % = 20.00
{2893249, 3471899}, delta % = 20.00
{3471899, 4166287}, delta % = 20.00
{4166287, 4999559}, delta % = 20.00
{4999559, 5999471}, delta % = 20.00
{5999471, 7199369}, delta % = 20.00