sparse_hash_map 对于特定数据非常慢
sparse_hash_map is very slow for specific data
tl;dr: 为什么 sparse_hash_map
中的键查找对于特定数据会慢 50 倍?
我正在使用我编写的一个非常简单的 Cython 包装器从 Google 的 sparsehash 库中为 sparse_hash_map
测试 键查找 的速度。哈希表包含 uint32_t
个键和 uint16_t
个值。对于随机键、值和查询,我得到超过 1M lookups/sec。但是具体数据我需要的性能勉强超过20klookups/sec.
包装器是 here. The table which runs slowly is here。基准代码是:
benchmark.pyx
:
from sparsehash cimport SparseHashMap
from libc.stdint cimport uint32_t
from libcpp.vector cimport vector
import time
import numpy as np
def fill_randomly(m, size):
keys = np.random.random_integers(0, 0xFFFFFFFF, size)
# 0 is a special domain-specific value
values = np.random.random_integers(1, 0xFFFF, size)
for j in range(size):
m[keys[j]] = values[j]
def benchmark_get():
cdef int dummy
cdef uint32_t i, j, table_key
cdef SparseHashMap m
cdef vector[uint32_t] q_keys
cdef int NUM_QUERIES = 1000000
cdef uint32_t MAX_REQUEST = 7448 * 2**19 - 1 # this is domain-specific
time_start = time.time()
### OPTION 1 ###
m = SparseHashMap('17.shash')
### OPTION 2 ###
# m = SparseHashMap(16130443)
# fill_randomly(m, 16130443)
q_keys = np.random.random_integers(0, MAX_REQUEST, NUM_QUERIES)
print("Initialization: %.3f" % (time.time() - time_start))
dummy = 0
time_start = time.time()
for i in range(NUM_QUERIES):
table_key = q_keys[i]
dummy += m.get(table_key)
dummy %= 0xFFFFFF # to prevent overflow error
time_elapsed = time.time() - time_start
if dummy == 42:
# So that the unused variable is not optimized away
print("Wow, lucky!")
print("Table size: %d" % len(m))
print("Total time: %.3f" % time_elapsed)
print("Seconds per query: %.8f" % (time_elapsed / NUM_QUERIES))
print("Queries per second: %.1f" % (NUM_QUERIES / time_elapsed))
def main():
benchmark_get()
benchmark.pyxbld
(因为pyximport
应该在C++模式下编译):
def make_ext(modname, pyxfilename):
from distutils.extension import Extension
return Extension(
name=modname,
sources=[pyxfilename],
language='c++'
)
run.py
:
import pyximport
pyximport.install()
import benchmark
benchmark.main()
17.shash
的结果是:
Initialization: 2.612
Table size: 16130443
Total time: 48.568
Seconds per query: 0.00004857
Queries per second: 20589.8
对于随机数据:
Initialization: 25.853
Table size: 16100260
Total time: 0.891
Seconds per query: 0.00000089
Queries per second: 1122356.3
17.shash
中的密钥分布是这样的(plt.hist(np.fromiter(m.keys(), dtype=np.uint32, count=len(m)), bins=50)
):
从 sparsehash
and gcc
上的文档来看,这里似乎使用了简单的散列(即 x
散列到 x
)。
除了散列冲突之外,是否有任何明显的原因可能导致此行为?根据我的发现,在 Cython 包装器中集成自定义哈希函数(即重载 std::hash<uint32_t>
)并非易事。
我找到了一个可行的解决方案,但它并不完美。
sparsehash_wrapper.cpp
:
#include "sparsehash/sparse_hash_map"
#include "stdint.h"
// syntax borrowed from
//
struct UInt32Hasher {
size_t operator()(const uint32_t& x) const {
return (x ^ (x << 17) ^ (x >> 13) + 3238229671);
}
};
template<class Key, class T>
class sparse_hash_map : public google::sparse_hash_map<Key, T, UInt32Hasher> {};
这是一个自定义哈希函数,我可以将其集成到现有包装器中,只需进行最少的代码更改:我只需将 sparsehash/sparse_hash_map
替换为 Cython sparsehash_wrapper.cpp
中的路径 .pxd
文件。到目前为止,唯一的问题是 pyximport
找不到 sparsehash_wrapper.cpp
除非我在 .pxd
.
中指定完整的绝对路径
问题确实与冲突有关:从头开始重新创建与 17.shash
具有相同内容的哈希映射后(即,创建一个空映射并插入 [=18 中的每个(键,值)对=]进去),性能上去了1M+req/sec.
tl;dr: 为什么 sparse_hash_map
中的键查找对于特定数据会慢 50 倍?
我正在使用我编写的一个非常简单的 Cython 包装器从 Google 的 sparsehash 库中为 sparse_hash_map
测试 键查找 的速度。哈希表包含 uint32_t
个键和 uint16_t
个值。对于随机键、值和查询,我得到超过 1M lookups/sec。但是具体数据我需要的性能勉强超过20klookups/sec.
包装器是 here. The table which runs slowly is here。基准代码是:
benchmark.pyx
:
from sparsehash cimport SparseHashMap
from libc.stdint cimport uint32_t
from libcpp.vector cimport vector
import time
import numpy as np
def fill_randomly(m, size):
keys = np.random.random_integers(0, 0xFFFFFFFF, size)
# 0 is a special domain-specific value
values = np.random.random_integers(1, 0xFFFF, size)
for j in range(size):
m[keys[j]] = values[j]
def benchmark_get():
cdef int dummy
cdef uint32_t i, j, table_key
cdef SparseHashMap m
cdef vector[uint32_t] q_keys
cdef int NUM_QUERIES = 1000000
cdef uint32_t MAX_REQUEST = 7448 * 2**19 - 1 # this is domain-specific
time_start = time.time()
### OPTION 1 ###
m = SparseHashMap('17.shash')
### OPTION 2 ###
# m = SparseHashMap(16130443)
# fill_randomly(m, 16130443)
q_keys = np.random.random_integers(0, MAX_REQUEST, NUM_QUERIES)
print("Initialization: %.3f" % (time.time() - time_start))
dummy = 0
time_start = time.time()
for i in range(NUM_QUERIES):
table_key = q_keys[i]
dummy += m.get(table_key)
dummy %= 0xFFFFFF # to prevent overflow error
time_elapsed = time.time() - time_start
if dummy == 42:
# So that the unused variable is not optimized away
print("Wow, lucky!")
print("Table size: %d" % len(m))
print("Total time: %.3f" % time_elapsed)
print("Seconds per query: %.8f" % (time_elapsed / NUM_QUERIES))
print("Queries per second: %.1f" % (NUM_QUERIES / time_elapsed))
def main():
benchmark_get()
benchmark.pyxbld
(因为pyximport
应该在C++模式下编译):
def make_ext(modname, pyxfilename):
from distutils.extension import Extension
return Extension(
name=modname,
sources=[pyxfilename],
language='c++'
)
run.py
:
import pyximport
pyximport.install()
import benchmark
benchmark.main()
17.shash
的结果是:
Initialization: 2.612
Table size: 16130443
Total time: 48.568
Seconds per query: 0.00004857
Queries per second: 20589.8
对于随机数据:
Initialization: 25.853
Table size: 16100260
Total time: 0.891
Seconds per query: 0.00000089
Queries per second: 1122356.3
17.shash
中的密钥分布是这样的(plt.hist(np.fromiter(m.keys(), dtype=np.uint32, count=len(m)), bins=50)
):
从 sparsehash
and gcc
上的文档来看,这里似乎使用了简单的散列(即 x
散列到 x
)。
除了散列冲突之外,是否有任何明显的原因可能导致此行为?根据我的发现,在 Cython 包装器中集成自定义哈希函数(即重载 std::hash<uint32_t>
)并非易事。
我找到了一个可行的解决方案,但它并不完美。
sparsehash_wrapper.cpp
:
#include "sparsehash/sparse_hash_map"
#include "stdint.h"
// syntax borrowed from
//
struct UInt32Hasher {
size_t operator()(const uint32_t& x) const {
return (x ^ (x << 17) ^ (x >> 13) + 3238229671);
}
};
template<class Key, class T>
class sparse_hash_map : public google::sparse_hash_map<Key, T, UInt32Hasher> {};
这是一个自定义哈希函数,我可以将其集成到现有包装器中,只需进行最少的代码更改:我只需将 sparsehash/sparse_hash_map
替换为 Cython sparsehash_wrapper.cpp
中的路径 .pxd
文件。到目前为止,唯一的问题是 pyximport
找不到 sparsehash_wrapper.cpp
除非我在 .pxd
.
问题确实与冲突有关:从头开始重新创建与 17.shash
具有相同内容的哈希映射后(即,创建一个空映射并插入 [=18 中的每个(键,值)对=]进去),性能上去了1M+req/sec.