如何将 32 位整数集中的随机整数存储在紧凑的数据结构中以快速检查成员资格?
How to store random integers from the set of 32-bit integers in a compact data structure to rapidly check membership?
我正在考虑如何 organize/allocate 记忆,这让我想到了这个问题,我已经提炼出它的本质,所以它可能看起来很突然,但是记忆问题太复杂了令人困惑和分心,所以我不在这里问。我试着早些时候问过,但没有得到回应。就这样吧。
假设您有一个系统,您希望尽快检查其中是否存在某个整数,并且您也希望能够尽快添加和删除整数。也就是说,它本质上是一个 key-store
,甚至不是键值存储。它只存储整数键,我猜理论上该值在概念上是一个 true
布尔值,但我认为它不一定必须存在。
一种解决方案是使用稀疏数组。例如,对于 3、5 和 9,此数组在 O(1) 时间内 return 为真。
[-, -, 3, -, 5, -, -, -, 9, -, -, ...]
您只需指定 integer - 1
并返回它是否存在于数组中的那个位置。但这样做的问题是双重的。首先,所有这些空值会占用大量内存。其次,对于我的项目,我有一个额外的约束,即 数组的长度只能为 1、2、4、8、16 和 32 个项目。所以我坚持使用短数组。这意味着我最有可能留下某种树木。
所以这些是限制条件:
- 32 位正整数。
- 紧凑的数据结构(最小浪费space)。
- 快速查找时间。
- 快deletion/insertion次。
- 数据结构中的数组大小只能为 1、2、4、8、16 或 32 项。
我认为适合这些约束的一种方法是一种树,例如 B+ 树。但我对这些还不是很熟悉,还在玩。我认为它就像一个只有键的 B+ 树(没有值,因此没有叶子)。但不完全确定这会是什么样子(在 JavaScript 中,如果我必须选择一种语言)。
我想到的另一种方法是二叉树。但这每个整数最多需要 32 个步骤来添加和检查成员资格,这似乎我们可以以某种方式进行优化。
在某些数据结构中存储顺序 32 位整数以快速检查成员资格、进行插入和删除的主要方法是什么?也许这可以从 IP 查找表或类似的东西中学习。
我将其用于自定义编程语言,尽管看到 JavaScript 中的实现会有所帮助。如果它只是您提供的数据结构的名称,请解释它是如何工作的或在 JavaScript 中实现它,而不是求助于可能已经解决问题的 JavaScript 内置函数。
这是我建议的方法。
每个节点都应该是一个包含 32 个条目的数组。如果第一个条目是 null
或数组,则它是整个搜索的 32 路拆分 space。否则它是这个块中条目的降序列表。用-1填写非条目。
这意味着在不超过 5 次查找中,我们会找到具有或不具有我们的值的块。然后我们进行线性扫描。对排序列表进行二分查找,天真地看起来会更有效率,但实际上它涉及一系列难以预测的分支,而这些分支是 CPU 讨厌的。在低级语言中(我希望你是这样的),避免 pipeline stall 并进行线性搜索会更快。
这是 JavaScript 中的一个实现。
class IntegerSet {
constructor() {
this.make_node = function () {
let node = new Array();
for (let i = 0; i < 32; i++) {
node[i] = -1;
}
return node;
};
this.root = this.make_node();
this.lookup = function (target) {
let node = this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
for (let i = 0; i < 32; i++) {
if (target == node[i]) {
return true;
}
// On average this lets us break out 1/4 of the way through.
else if (node[i] < target) {
return false;
}
}
return false;
};
this.insert = function (target, node) {
node = node || this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
for (let i = 0; i < 32; i++) {
if (target == node[i]) {
return false;
}
else if (node[i] < target) {
// copy elements along, in order.
let value = node[i];
for (let j = i; j < 31; j++) {
node[j] = target;
target = value;
value = node[j+1];
if (target == -1) {
break;
}
}
node[31] = target; // just in case the block is full
if (-1 == target) {
return true;
}
else {
break; // we inserted and filled the node.
}
}
}
// If we get here, we need to split the node.
let node_copy = this.make_node();
for (let i = 0; i < 32; i++) {
node_copy[i] = node[i];
node[i] = this.make_node();
}
size /= 32;
let idx = Math.floor(target / size);
node[idx][0] = target % size;
for (let i = 0; i < 32; i++) {
let value = node_copy[i];
idx = Math.floor(value / size);
this.insert(value % size, node[idx]);
}
};
this.remove = function (target) {
let node = this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
for (let i = 0; i < 32; i++) {
if (target == node[i]) {
for (let j = i; j < 31; j++) {
node[j] = node[j+1];
if (node[j] == -1) {
break;
}
}
node[31] = -1;
return true;
}
}
return false;
};
}
}
let foo = new IntegerSet();
for (let i = 0; i < 300; i++) {
foo.insert(Math.floor(Math.random() * 2**32));
}
console.log(foo.root);
console.log(foo.lookup(5));
console.log(foo.remove(5));
console.log(foo.insert(5));
console.log(foo.root);
console.log(foo.lookup(5));
console.log(foo.remove(5));
下面是这个数据结构的特点。
一次读取需要不超过 5 次查找才能到达一个块,然后使用最多 64 个 if 语句进行扫描(预测结果总是错误的)。如果插入很多随机元素,叶子平均半满,平均我们只做1/4的扫描。
每个内部节点有 32 个子节点这一事实意味着内部节点只会给结构增加百分之几的开销。
如果先插入再删除,性能仍然不错,但我们永远不会回收内存。
下面是一个类似于前面的实现,但使用不同大小的数组,并允许您回收 space。它旨在使 space 的 60-70% 成为实际数据。但这显然伴随着对 insert/remove.
的处罚
class IntegerSet {
constructor() {
this.make_node = function (size) {
let node = new Array();
for (let i = 0; i < size; i++) {
node[i] = -1;
}
node.child_size = 0;
return node;
};
this.root = this.make_node(2);
this.lookup = function (target) {
let node = this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
let i = 0;
while (target < node[i]) {
i++;
}
return target == node[i];
};
this.insert = function (target, node, parent_node, size) {
node = node || this.root;
if (size == null) {
size = 2**32;
}
let idx = null;
let update_parent = false;
while (Array.isArray(node[0])) {
size /= 32;
parent_node = node;
update_parent = true;
idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
let i = 0;
while (target < node[i]) {
i++;
}
if (target == node[i]) {
return true; // already had it.
}
else {
// copy along, in order.
let value = node[i];
while (-1 < target) {
value = node[i]
node[i] = target;
i++;
target = value;
}
if (update_parent) {
parent_node.child_size++;
}
if (i == node.length) {
let new_node = null;
if (node.length < 32) {
// Double node size.
new_node = this.make_node(node.length * 2);
for (let j = 0; j < i; j++) {
new_node[j] = node[j];
}
}
else {
// Need to split.
new_node = this.make_node(node.length);
for (let j = 0; j < node.length; j++) {
new_node[j] = this.make_node(2);
}
new_node.child_size = node.length;
size /= 32;
for (let j = 0; j < 32; j++) {
target = node[j];
let new_idx = Math.floor(target / size);
target = target % size;
this.insert(target, new_node[new_idx], parent_node, size);
}
}
if (parent_node != null) {
parent_node[idx] = new_node;
}
else {
this.root = new_node;
}
}
}
};
this.remove = function (target) {
let node = this.root;
let grandparent_node = null;
let parent_node = null;
let size = 2**32;
let idx = null;
let parent_idx = null;
while (Array.isArray(node[0])) {
size /= 32;
parent_idx = idx;
idx = Math.floor(target / size);
grandparent_node = parent_node;
parent_node = node;
node = node[idx];
target = target % size;
}
let i = 0;
while (target < node[i]) {
i++;
}
if (node[i] == target) {
// now delete along.
while (-1 < node[i]) {
node[i] = node[i+1];
i++;
}
if (parent_node != null) {
parent_node.child_size--;
// Reconsolidate if we save about 1/4 of memory.
if (parent_node.child_size < 24) {
let new_node = this.make_node(32);
let new_id = 0;
for (let i = 31; -1 < i; i--) {
let old_node = parent_node[i];
let j = 0;
while (-1 < old_node[j]) {
new_node[new_id] = size * i + old_node[j];
new_id++;
j++;
}
}
if (grandparent_node == null) {
this.root = new_node;
}
else {
grandparent_node[parent_idx] = new_node;
grandparent_node.child_size += parent_node.child_size - 33;
}
return true;
}
}
// Reconsolidate if we save about 2/3 of memory.
if (i < node.length / 3 && 2 < node.length) {
let new_node = this.make_node(node.length/2);
for (let j = 0; j < i; j++) {
new_node[j] = node[j];
}
if (parent_node != null) {
parent_node[idx] = new_node;
}
else {
this.root = new_node;
}
}
return true;
}
else {
return false;
}
};
}
}
let foo = new IntegerSet();
let keep = new Array;
for (let i = 0; i < 5; i++) {
keep.push(Math.floor(Math.random() * 2**32));
}
let remove = new Array;
for (let i = 0; i < 300; i++) {
remove.push(Math.floor(Math.random() * 2**32));
}
for (let i = 0; i < keep.length; i++) {
foo.insert(keep[i]);
}
for (let i = 0; i < remove.length; i++) {
foo.insert(remove[i]);
}
for (let i = 0; i < remove.length; i++) {
foo.remove(remove[i]);
}
console.log(foo.root);
for (let i = 0; i < keep.length; i++) {
console.log(keep[i]);
console.log(foo.lookup(keep[i]));
console.log(foo.remove(keep[i]));
console.log(foo.lookup(keep[i]));
console.log(foo.remove(keep[i]));
}
您写道:
One approach I am thinking that might work that fits these constraints is a sort of tree, like a B+tree. [...] I think it would be like a key-only B+tree (no values, and hence no leaves). But not entirely sure what this would look like (in JavaScript, if I had to pick a language).
确实,key-only B(+)-tree 是一个可能的解决方案。它解决了您从稀疏数组解决方案中得到的浪费 space 的问题。当数据类型恰好是字符串或浮点数而不是数字时,它也是一个很好的解决方案,这对于稀疏数组解决方案来说会是一个问题。
下面我 post 的代码类似于 我之前 post 编辑过你的另一个问题,但它不是 search 树.
想法是使用 B+ 树,考虑到数组大小为 2 的幂的额外要求。实际数据键位于树的底层。内部节点重复其子树包含的最小键。这可用于决定从根部向下遍历树时选择哪个 child,以搜索应该找到或应该插入密钥的 bottom-level 位置。
通过在节点满时将 children/keys 分配给邻居来保持树紧凑,仅在邻居中没有空间时才拆分节点。删除键时,受影响的节点将尽可能与邻居合并。每个节点(根节点除外)保证至少使用最大(32)节点容量的一半——用于存储 children 或密钥(在底层)。
密钥位于从根开始搜索的位置。使用二进制搜索选择 child,以便 child 的键是小于或等于所定位键的最大键。如果在底层找不到密钥,我们至少有 location 它应该在的地方,可以插入那里。
实施
class Node {
constructor(capacity) {
// Mimic fixed-size array (avoid accidentally growing it)
this.children = Object.seal(Array(capacity).fill(null));
this.childCount = 0; // Number of used slots in children array
this.key = null; // Property for supporting a search
// Maintain back-link to parent.
this.parent = null;
// Per level in the tree, maintain a doubly linked list
this.prev = this.next = null;
}
setCapacity(capacity) {
if (capacity < 1) return;
// Here we make a new array, and copy the data into it
let children = Object.seal(Array(capacity).fill(null));
for (let i = 0; i < this.childCount; i++) children[i] = this.children[i];
this.children = children;
}
isLeaf() {
return !(this.children[0] instanceof Node);
}
index() {
return this.parent.children.indexOf(this);
}
updateKey() {
this.key = this.isLeaf() ? this.children[0] : this.children[0].key;
for (let node = this.parent; node; node = node.parent) {
node.key = node.children[0].key;
}
}
wipe(start, end) {
this.children.copyWithin(start, end, this.childCount);
for (let i = this.childCount - end + start; i < this.childCount; i++) {
this.children[i] = null;
}
this.childCount -= end - start;
// Reduce allocated size if possible
if (this.childCount * 2 <= this.children.length) this.setCapacity(this.children.length / 2);
// Update key if first item changed
if (start === 0) this.updateKey();
}
moveFrom(neighbor, target, start, count=1) {
// Note: `start` can have two meanings:
// if neighbor is null, it is the key/Node to move to the target
// if neighbor is a Node, it is the index from where keys(s)/Node(s) have to be moved to the target
// Make room in target node
if (this.childCount + count > this.children.length) this.setCapacity(this.children.length * 2);
this.children.copyWithin(target + count, target, Math.max(target + count, this.childCount));
this.childCount += count;
if (neighbor !== null) {
// Copy the children
for (let i = 0; i < count; i++) {
this.children[target + i] = neighbor.children[start + i];
}
// Remove the original references
neighbor.wipe(start, start + count);
} else {
this.children[target] = start; // start is key to insert
}
// Set parent link(s)
if (!this.isLeaf()) {
for (let i = 0; i < count; i++) {
this.children[target + i].parent = this;
}
}
// Update key if first item changed
if (target === 0) this.updateKey();
}
moveToNext(count) {
this.next.moveFrom(this, 0, this.childCount - count, count);
}
moveFromNext(count) {
this.moveFrom(this.next, this.childCount, 0, count);
}
basicRemove(index) {
if (!this.isLeaf()) {
// Take node out of the level's linked list
let prev = this.children[index].prev;
let next = this.children[index].next;
if (prev) prev.next = next;
if (next) next.prev = prev;
}
this.wipe(index, index + 1);
}
basicInsert(index, value) { // value can be a key or a Node
this.moveFrom(null, index, value);
if (value instanceof Node) {
// Insert node in the level's linked list
if (index > 0) {
value.prev = this.children[index-1];
value.next = value.prev.next;
} else if (this.childCount > 1) {
value.next = this.children[1];
value.prev = value.next.prev;
}
if (value.prev) value.prev.next = value;
if (value.next) value.next.prev = value;
}
}
pairWithSmallest() {
return this.prev && (!this.next || this.next.childCount > this.prev.childCount)
? [this.prev, this] : [this, this.next];
}
toString() {
return "[" + this.children.map(v => v??"-").join() + "]";
}
}
class Tree {
constructor(nodeCapacity=32) {
this.nodeCapacity = nodeCapacity;
this.root = new Node(1);
this.first = this.root; // Head of doubly linked list at bottom level
}
locate(key) {
let node = this.root;
while (!node.isLeaf()) {
// Binary search among keys
let low = 0;
let high = node.childCount;
while (low < high) {
let index = (low + high) >> 1;
if (key >= node.children[index].key) {
low = index + 1;
} else {
high = index;
}
}
if (low) low--;
node = node.children[low];
}
// Binary search in leaf
let low = 0;
let high = node.childCount;
while (low < high) {
let index = (low + high) >> 1;
if (key > node.children[index]) {
low = index + 1;
} else {
high = index;
}
}
return [node, low];
}
has(key) {
let [node, index] = this.locate(key);
return index < node.childCount && node.children[index] === key;
}
remove(key) {
let [node, index] = this.locate(key);
if (index >= node.childCount || node.children[index] !== key) return; // not found
while (true) {
node.basicRemove(index);
// Exit when node's fill ratio is fine
if (!node.parent || node.childCount * 2 > this.nodeCapacity) return;
// Node has potentially too few children, we should either merge or redistribute
let [left, right] = node.pairWithSmallest();
if (!left || !right) { // A node with no siblings? Must become the root!
this.root = node;
node.parent = null;
return;
}
let sumCount = left.childCount + right.childCount;
let childCount = sumCount >> 1;
// Check whether to merge or to redistribute
if (sumCount > this.nodeCapacity) { // redistribute
// Move some data from the bigger to the smaller node
let shift = childCount - node.childCount;
if (!shift) { // Boundary case: when a redistribution would bring no improvement
console.assert(node.childCount * 2 === this.nodeCapacity && sumCount === this.nodeCapacity + 1);
return;
}
if (node === left) { // move some children from right to left
left.moveFromNext(shift);
} else { // move some children from left to right
left.moveToNext(shift);
}
return;
}
// Merge:
// Move all data from the right to the left
left.moveFromNext(right.childCount);
// Prepare to delete right node
node = right.parent;
index = right.index();
}
}
insert(value) { // value is a key
let [node, index] = this.locate(value);
if (index < node.childCount && node[index] === value) return; // already present
while (node.childCount === this.nodeCapacity) { // No room here
if (index === 0 && node.prev && node.prev.childCount < this.nodeCapacity) {
return node.prev.basicInsert(node.prev.childCount, value);
}
// Check whether we can redistribute (to avoid a split)
if (node !== this.root) {
let [left, right] = node.pairWithSmallest();
let joinedIndex = left === node ? index : left.childCount + index;
let sumCount = left.childCount + right.childCount + 1;
if (sumCount <= 2 * this.nodeCapacity) { // redistribute
let childCount = sumCount >> 1;
if (node === right) { // redistribute to the left
let insertInLeft = joinedIndex < childCount;
left.moveFromNext(childCount - left.childCount - +insertInLeft);
} else { // redistribute to the right
let insertInRight = index >= sumCount - childCount;
left.moveToNext(childCount - right.childCount - +insertInRight);
}
if (joinedIndex > left.childCount ||
joinedIndex === left.childCount && left.childCount > right.childCount) {
right.basicInsert(joinedIndex - left.childCount, value);
} else {
left.basicInsert(joinedIndex, value);
}
return;
}
}
// Cannot redistribute: split node
let childCount = node.childCount >> 1;
// Create a new node that will later become the right sibling of this node
let sibling = new Node(childCount);
// Move half of node node's data to it
sibling.moveFrom(node, 0, childCount, childCount);
// Insert the value in either the current node or the new one
if (index > node.childCount) {
sibling.basicInsert(index - node.childCount, value);
} else {
node.basicInsert(index, value);
}
// Is this the root?
if (!node.parent) {
// ...then first create a parent, which is the new root
this.root = new Node(2);
this.root.basicInsert(0, node);
}
// Prepare for inserting the sibling node into the tree
index = node.index() + 1;
node = node.parent;
value = sibling; // value is now a Node
}
node.basicInsert(index, value);
}
/* Below this point: these methods are optional */
* [Symbol.iterator]() { // Make tree iterable
let i = 0;
for (let node = this.first; node; node = node.next) {
for (let i = 0; i < node.childCount; i++) yield node.children[i];
}
}
print() {
console.log(this.root && this.root.toString());
}
verify() {
// Raise an error when the tree violates one of the required properties
if (!this.root) return; // An empty tree is fine.
if (this.root.parent) throw "root should not have a parent";
// Perform a breadth first traversal
let q = [this.root];
while (q.length) {
if (q[0].isLeaf() && this.first !== q[0]) throw "this.first is not pointing to first leaf";
let level = [];
let last = null;
for (let parent of q) {
if (!(parent instanceof Node)) throw "parent is not instance of Node";
if (parent.children.length > this.nodeCapacity) throw "node's children array is too large";
if (parent.childCount > 0 && parent.childCount * 2 <= parent.children.length) throw "node's fill ratio is too low";
for (let i = parent.childCount; i < parent.children.length; i++) {
if (parent.children[i] !== null) throw "child beyond childCount should be null but is not";
}
if (parent.isLeaf()) {
if (parent.children[0] !== parent.key) throw "key does not match with first child value";
for (let value of parent.children.slice(0, parent.childCount)) {
if (value === null) throw "leaf has a null as value";
if (value instanceof Node) throw "leaf has a Node as value";
}
} else {
if (parent.children[0].key !== parent.key) throw "key does not match with first child's key";
for (let node of parent.children.slice(0, parent.childCount)) {
if (node === null) throw "internal node has a null as value";
if (!(node instanceof Node)) throw "internal node has a non-Node as value";
if (node.parent !== parent) throw "wrong parent";
if (node.prev !== last) throw "prev link incorrect";
if (last && last.next !== node) throw "next link incorrect";
if (last && last.children.length + node.children.length <= this.nodeCapacity) {
throw "two consecutive siblings have a total number of children that is too small";
}
if (node.childCount * 2 < this.nodeCapacity) {
throw "internal node is too small: " + node;
}
level.push(node);
last = node;
}
}
}
if (last && last.next) throw "last node in level has a next reference";
q = level;
}
}
test(count=100, option=3) {
// option:
// 0 = always insert & delete at left side (offset 0)
// 1 = always insert & delete at right side
// 2 = always insert & delete at middle
// 3 = insert & delete at random offsets
// Create array to perform the same operations on it as on the tree
let max = count*2;
let arr = [];
// Perform a series of insertions
for (let i = 0; i < count; i++) {
// Choose random value
let key = Math.floor(Math.random() * max);
// Perform same insertion in array and tree
arr.push(key);
this.insert(key);
// Verify tree consistency and properties
this.verify();
// Verify the order of keys in the array is the same as in the tree
if (arr.sort((a,b) => a-b)+"" !== [...this]+"") throw i + ": tree not same as array";
}
// Perform a series of has-calls
for (let i = 0; i < count; i++) {
// Choose random update index
let key = Math.floor(Math.random() * max);
// Perform same insertion in array and tree
let has = arr.includes(key);
if (has !== this.has(key)) {
throw "has() returns inconsistent result";
}
if (!has) {
this.remove(key); // should not alter the tree
// Verify the order of keys in the array is the same as in the tree
if (arr+"" !== [...this]+"") throw i + ": tree not same as array";
}
}
// Perform a series of deletions
for (let i = arr.length - 1; i >= 0; i--) {
// Choose random deletion key
let key = arr[Math.floor(Math.random() * i)];
// Perform same deletion in array and tree
arr.splice(arr.indexOf(key), 1);
this.remove(key);
// Verify tree consistency and properties
this.verify();
// Verify the order of keys in the array is the same as in the tree
if (arr+"" !== [...this]+"") throw "tree not same as array";
}
}
}
// Perform 1000 insertions, 1000 updates, and 1000 deletions on a tree with node capacity of 8
new Tree(8).test(1000);
console.log("all tests completed");
支持的数据类型
键的数据类型可以是比较运算符适用的任何类型。所以对于字符串或数字它会起作用。在 JavaScript 中,它也适用于已实现 valueOf
或 toString
方法的 objects。例如,原生 Date
objects 已经是这种情况。 JavaScript 将首先尝试 valueOf
方法,如果没有,则尝试 toString
方法,该方法也在 Object 原型上定义。
注意事项
我在每个节点中存储了一个 key
,因为这是一个 in-memory 解决方案。从历史上看,B(+) 树用于磁盘存储,在这种情况下保持低块读取数很重要。标准的 B(+)-tree 实现将键存储在更高一级的 parent 中作为数组。这样搜索就不必加载每个 child 记录,同时搜索要选择的正确记录。
此外,它不需要存储第一个 child 的密钥,因为我们实际上只需要 分隔 两个连续 child 的密钥]仁.
我正在考虑如何 organize/allocate 记忆,这让我想到了这个问题,我已经提炼出它的本质,所以它可能看起来很突然,但是记忆问题太复杂了令人困惑和分心,所以我不在这里问。我试着早些时候问过,但没有得到回应。就这样吧。
假设您有一个系统,您希望尽快检查其中是否存在某个整数,并且您也希望能够尽快添加和删除整数。也就是说,它本质上是一个 key-store
,甚至不是键值存储。它只存储整数键,我猜理论上该值在概念上是一个 true
布尔值,但我认为它不一定必须存在。
一种解决方案是使用稀疏数组。例如,对于 3、5 和 9,此数组在 O(1) 时间内 return 为真。
[-, -, 3, -, 5, -, -, -, 9, -, -, ...]
您只需指定 integer - 1
并返回它是否存在于数组中的那个位置。但这样做的问题是双重的。首先,所有这些空值会占用大量内存。其次,对于我的项目,我有一个额外的约束,即 数组的长度只能为 1、2、4、8、16 和 32 个项目。所以我坚持使用短数组。这意味着我最有可能留下某种树木。
所以这些是限制条件:
- 32 位正整数。
- 紧凑的数据结构(最小浪费space)。
- 快速查找时间。
- 快deletion/insertion次。
- 数据结构中的数组大小只能为 1、2、4、8、16 或 32 项。
我认为适合这些约束的一种方法是一种树,例如 B+ 树。但我对这些还不是很熟悉,还在玩。我认为它就像一个只有键的 B+ 树(没有值,因此没有叶子)。但不完全确定这会是什么样子(在 JavaScript 中,如果我必须选择一种语言)。
我想到的另一种方法是二叉树。但这每个整数最多需要 32 个步骤来添加和检查成员资格,这似乎我们可以以某种方式进行优化。
在某些数据结构中存储顺序 32 位整数以快速检查成员资格、进行插入和删除的主要方法是什么?也许这可以从 IP 查找表或类似的东西中学习。
我将其用于自定义编程语言,尽管看到 JavaScript 中的实现会有所帮助。如果它只是您提供的数据结构的名称,请解释它是如何工作的或在 JavaScript 中实现它,而不是求助于可能已经解决问题的 JavaScript 内置函数。
这是我建议的方法。
每个节点都应该是一个包含 32 个条目的数组。如果第一个条目是 null
或数组,则它是整个搜索的 32 路拆分 space。否则它是这个块中条目的降序列表。用-1填写非条目。
这意味着在不超过 5 次查找中,我们会找到具有或不具有我们的值的块。然后我们进行线性扫描。对排序列表进行二分查找,天真地看起来会更有效率,但实际上它涉及一系列难以预测的分支,而这些分支是 CPU 讨厌的。在低级语言中(我希望你是这样的),避免 pipeline stall 并进行线性搜索会更快。
这是 JavaScript 中的一个实现。
class IntegerSet {
constructor() {
this.make_node = function () {
let node = new Array();
for (let i = 0; i < 32; i++) {
node[i] = -1;
}
return node;
};
this.root = this.make_node();
this.lookup = function (target) {
let node = this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
for (let i = 0; i < 32; i++) {
if (target == node[i]) {
return true;
}
// On average this lets us break out 1/4 of the way through.
else if (node[i] < target) {
return false;
}
}
return false;
};
this.insert = function (target, node) {
node = node || this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
for (let i = 0; i < 32; i++) {
if (target == node[i]) {
return false;
}
else if (node[i] < target) {
// copy elements along, in order.
let value = node[i];
for (let j = i; j < 31; j++) {
node[j] = target;
target = value;
value = node[j+1];
if (target == -1) {
break;
}
}
node[31] = target; // just in case the block is full
if (-1 == target) {
return true;
}
else {
break; // we inserted and filled the node.
}
}
}
// If we get here, we need to split the node.
let node_copy = this.make_node();
for (let i = 0; i < 32; i++) {
node_copy[i] = node[i];
node[i] = this.make_node();
}
size /= 32;
let idx = Math.floor(target / size);
node[idx][0] = target % size;
for (let i = 0; i < 32; i++) {
let value = node_copy[i];
idx = Math.floor(value / size);
this.insert(value % size, node[idx]);
}
};
this.remove = function (target) {
let node = this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
for (let i = 0; i < 32; i++) {
if (target == node[i]) {
for (let j = i; j < 31; j++) {
node[j] = node[j+1];
if (node[j] == -1) {
break;
}
}
node[31] = -1;
return true;
}
}
return false;
};
}
}
let foo = new IntegerSet();
for (let i = 0; i < 300; i++) {
foo.insert(Math.floor(Math.random() * 2**32));
}
console.log(foo.root);
console.log(foo.lookup(5));
console.log(foo.remove(5));
console.log(foo.insert(5));
console.log(foo.root);
console.log(foo.lookup(5));
console.log(foo.remove(5));
下面是这个数据结构的特点。
一次读取需要不超过 5 次查找才能到达一个块,然后使用最多 64 个 if 语句进行扫描(预测结果总是错误的)。如果插入很多随机元素,叶子平均半满,平均我们只做1/4的扫描。
每个内部节点有 32 个子节点这一事实意味着内部节点只会给结构增加百分之几的开销。
如果先插入再删除,性能仍然不错,但我们永远不会回收内存。
下面是一个类似于前面的实现,但使用不同大小的数组,并允许您回收 space。它旨在使 space 的 60-70% 成为实际数据。但这显然伴随着对 insert/remove.
的处罚class IntegerSet {
constructor() {
this.make_node = function (size) {
let node = new Array();
for (let i = 0; i < size; i++) {
node[i] = -1;
}
node.child_size = 0;
return node;
};
this.root = this.make_node(2);
this.lookup = function (target) {
let node = this.root;
let size = 2**32;
while (Array.isArray(node[0])) {
size /= 32;
let idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
let i = 0;
while (target < node[i]) {
i++;
}
return target == node[i];
};
this.insert = function (target, node, parent_node, size) {
node = node || this.root;
if (size == null) {
size = 2**32;
}
let idx = null;
let update_parent = false;
while (Array.isArray(node[0])) {
size /= 32;
parent_node = node;
update_parent = true;
idx = Math.floor(target / size);
node = node[idx];
target = target % size;
}
let i = 0;
while (target < node[i]) {
i++;
}
if (target == node[i]) {
return true; // already had it.
}
else {
// copy along, in order.
let value = node[i];
while (-1 < target) {
value = node[i]
node[i] = target;
i++;
target = value;
}
if (update_parent) {
parent_node.child_size++;
}
if (i == node.length) {
let new_node = null;
if (node.length < 32) {
// Double node size.
new_node = this.make_node(node.length * 2);
for (let j = 0; j < i; j++) {
new_node[j] = node[j];
}
}
else {
// Need to split.
new_node = this.make_node(node.length);
for (let j = 0; j < node.length; j++) {
new_node[j] = this.make_node(2);
}
new_node.child_size = node.length;
size /= 32;
for (let j = 0; j < 32; j++) {
target = node[j];
let new_idx = Math.floor(target / size);
target = target % size;
this.insert(target, new_node[new_idx], parent_node, size);
}
}
if (parent_node != null) {
parent_node[idx] = new_node;
}
else {
this.root = new_node;
}
}
}
};
this.remove = function (target) {
let node = this.root;
let grandparent_node = null;
let parent_node = null;
let size = 2**32;
let idx = null;
let parent_idx = null;
while (Array.isArray(node[0])) {
size /= 32;
parent_idx = idx;
idx = Math.floor(target / size);
grandparent_node = parent_node;
parent_node = node;
node = node[idx];
target = target % size;
}
let i = 0;
while (target < node[i]) {
i++;
}
if (node[i] == target) {
// now delete along.
while (-1 < node[i]) {
node[i] = node[i+1];
i++;
}
if (parent_node != null) {
parent_node.child_size--;
// Reconsolidate if we save about 1/4 of memory.
if (parent_node.child_size < 24) {
let new_node = this.make_node(32);
let new_id = 0;
for (let i = 31; -1 < i; i--) {
let old_node = parent_node[i];
let j = 0;
while (-1 < old_node[j]) {
new_node[new_id] = size * i + old_node[j];
new_id++;
j++;
}
}
if (grandparent_node == null) {
this.root = new_node;
}
else {
grandparent_node[parent_idx] = new_node;
grandparent_node.child_size += parent_node.child_size - 33;
}
return true;
}
}
// Reconsolidate if we save about 2/3 of memory.
if (i < node.length / 3 && 2 < node.length) {
let new_node = this.make_node(node.length/2);
for (let j = 0; j < i; j++) {
new_node[j] = node[j];
}
if (parent_node != null) {
parent_node[idx] = new_node;
}
else {
this.root = new_node;
}
}
return true;
}
else {
return false;
}
};
}
}
let foo = new IntegerSet();
let keep = new Array;
for (let i = 0; i < 5; i++) {
keep.push(Math.floor(Math.random() * 2**32));
}
let remove = new Array;
for (let i = 0; i < 300; i++) {
remove.push(Math.floor(Math.random() * 2**32));
}
for (let i = 0; i < keep.length; i++) {
foo.insert(keep[i]);
}
for (let i = 0; i < remove.length; i++) {
foo.insert(remove[i]);
}
for (let i = 0; i < remove.length; i++) {
foo.remove(remove[i]);
}
console.log(foo.root);
for (let i = 0; i < keep.length; i++) {
console.log(keep[i]);
console.log(foo.lookup(keep[i]));
console.log(foo.remove(keep[i]));
console.log(foo.lookup(keep[i]));
console.log(foo.remove(keep[i]));
}
您写道:
One approach I am thinking that might work that fits these constraints is a sort of tree, like a B+tree. [...] I think it would be like a key-only B+tree (no values, and hence no leaves). But not entirely sure what this would look like (in JavaScript, if I had to pick a language).
确实,key-only B(+)-tree 是一个可能的解决方案。它解决了您从稀疏数组解决方案中得到的浪费 space 的问题。当数据类型恰好是字符串或浮点数而不是数字时,它也是一个很好的解决方案,这对于稀疏数组解决方案来说会是一个问题。
下面我 post 的代码类似于
想法是使用 B+ 树,考虑到数组大小为 2 的幂的额外要求。实际数据键位于树的底层。内部节点重复其子树包含的最小键。这可用于决定从根部向下遍历树时选择哪个 child,以搜索应该找到或应该插入密钥的 bottom-level 位置。
通过在节点满时将 children/keys 分配给邻居来保持树紧凑,仅在邻居中没有空间时才拆分节点。删除键时,受影响的节点将尽可能与邻居合并。每个节点(根节点除外)保证至少使用最大(32)节点容量的一半——用于存储 children 或密钥(在底层)。
密钥位于从根开始搜索的位置。使用二进制搜索选择 child,以便 child 的键是小于或等于所定位键的最大键。如果在底层找不到密钥,我们至少有 location 它应该在的地方,可以插入那里。
实施
class Node {
constructor(capacity) {
// Mimic fixed-size array (avoid accidentally growing it)
this.children = Object.seal(Array(capacity).fill(null));
this.childCount = 0; // Number of used slots in children array
this.key = null; // Property for supporting a search
// Maintain back-link to parent.
this.parent = null;
// Per level in the tree, maintain a doubly linked list
this.prev = this.next = null;
}
setCapacity(capacity) {
if (capacity < 1) return;
// Here we make a new array, and copy the data into it
let children = Object.seal(Array(capacity).fill(null));
for (let i = 0; i < this.childCount; i++) children[i] = this.children[i];
this.children = children;
}
isLeaf() {
return !(this.children[0] instanceof Node);
}
index() {
return this.parent.children.indexOf(this);
}
updateKey() {
this.key = this.isLeaf() ? this.children[0] : this.children[0].key;
for (let node = this.parent; node; node = node.parent) {
node.key = node.children[0].key;
}
}
wipe(start, end) {
this.children.copyWithin(start, end, this.childCount);
for (let i = this.childCount - end + start; i < this.childCount; i++) {
this.children[i] = null;
}
this.childCount -= end - start;
// Reduce allocated size if possible
if (this.childCount * 2 <= this.children.length) this.setCapacity(this.children.length / 2);
// Update key if first item changed
if (start === 0) this.updateKey();
}
moveFrom(neighbor, target, start, count=1) {
// Note: `start` can have two meanings:
// if neighbor is null, it is the key/Node to move to the target
// if neighbor is a Node, it is the index from where keys(s)/Node(s) have to be moved to the target
// Make room in target node
if (this.childCount + count > this.children.length) this.setCapacity(this.children.length * 2);
this.children.copyWithin(target + count, target, Math.max(target + count, this.childCount));
this.childCount += count;
if (neighbor !== null) {
// Copy the children
for (let i = 0; i < count; i++) {
this.children[target + i] = neighbor.children[start + i];
}
// Remove the original references
neighbor.wipe(start, start + count);
} else {
this.children[target] = start; // start is key to insert
}
// Set parent link(s)
if (!this.isLeaf()) {
for (let i = 0; i < count; i++) {
this.children[target + i].parent = this;
}
}
// Update key if first item changed
if (target === 0) this.updateKey();
}
moveToNext(count) {
this.next.moveFrom(this, 0, this.childCount - count, count);
}
moveFromNext(count) {
this.moveFrom(this.next, this.childCount, 0, count);
}
basicRemove(index) {
if (!this.isLeaf()) {
// Take node out of the level's linked list
let prev = this.children[index].prev;
let next = this.children[index].next;
if (prev) prev.next = next;
if (next) next.prev = prev;
}
this.wipe(index, index + 1);
}
basicInsert(index, value) { // value can be a key or a Node
this.moveFrom(null, index, value);
if (value instanceof Node) {
// Insert node in the level's linked list
if (index > 0) {
value.prev = this.children[index-1];
value.next = value.prev.next;
} else if (this.childCount > 1) {
value.next = this.children[1];
value.prev = value.next.prev;
}
if (value.prev) value.prev.next = value;
if (value.next) value.next.prev = value;
}
}
pairWithSmallest() {
return this.prev && (!this.next || this.next.childCount > this.prev.childCount)
? [this.prev, this] : [this, this.next];
}
toString() {
return "[" + this.children.map(v => v??"-").join() + "]";
}
}
class Tree {
constructor(nodeCapacity=32) {
this.nodeCapacity = nodeCapacity;
this.root = new Node(1);
this.first = this.root; // Head of doubly linked list at bottom level
}
locate(key) {
let node = this.root;
while (!node.isLeaf()) {
// Binary search among keys
let low = 0;
let high = node.childCount;
while (low < high) {
let index = (low + high) >> 1;
if (key >= node.children[index].key) {
low = index + 1;
} else {
high = index;
}
}
if (low) low--;
node = node.children[low];
}
// Binary search in leaf
let low = 0;
let high = node.childCount;
while (low < high) {
let index = (low + high) >> 1;
if (key > node.children[index]) {
low = index + 1;
} else {
high = index;
}
}
return [node, low];
}
has(key) {
let [node, index] = this.locate(key);
return index < node.childCount && node.children[index] === key;
}
remove(key) {
let [node, index] = this.locate(key);
if (index >= node.childCount || node.children[index] !== key) return; // not found
while (true) {
node.basicRemove(index);
// Exit when node's fill ratio is fine
if (!node.parent || node.childCount * 2 > this.nodeCapacity) return;
// Node has potentially too few children, we should either merge or redistribute
let [left, right] = node.pairWithSmallest();
if (!left || !right) { // A node with no siblings? Must become the root!
this.root = node;
node.parent = null;
return;
}
let sumCount = left.childCount + right.childCount;
let childCount = sumCount >> 1;
// Check whether to merge or to redistribute
if (sumCount > this.nodeCapacity) { // redistribute
// Move some data from the bigger to the smaller node
let shift = childCount - node.childCount;
if (!shift) { // Boundary case: when a redistribution would bring no improvement
console.assert(node.childCount * 2 === this.nodeCapacity && sumCount === this.nodeCapacity + 1);
return;
}
if (node === left) { // move some children from right to left
left.moveFromNext(shift);
} else { // move some children from left to right
left.moveToNext(shift);
}
return;
}
// Merge:
// Move all data from the right to the left
left.moveFromNext(right.childCount);
// Prepare to delete right node
node = right.parent;
index = right.index();
}
}
insert(value) { // value is a key
let [node, index] = this.locate(value);
if (index < node.childCount && node[index] === value) return; // already present
while (node.childCount === this.nodeCapacity) { // No room here
if (index === 0 && node.prev && node.prev.childCount < this.nodeCapacity) {
return node.prev.basicInsert(node.prev.childCount, value);
}
// Check whether we can redistribute (to avoid a split)
if (node !== this.root) {
let [left, right] = node.pairWithSmallest();
let joinedIndex = left === node ? index : left.childCount + index;
let sumCount = left.childCount + right.childCount + 1;
if (sumCount <= 2 * this.nodeCapacity) { // redistribute
let childCount = sumCount >> 1;
if (node === right) { // redistribute to the left
let insertInLeft = joinedIndex < childCount;
left.moveFromNext(childCount - left.childCount - +insertInLeft);
} else { // redistribute to the right
let insertInRight = index >= sumCount - childCount;
left.moveToNext(childCount - right.childCount - +insertInRight);
}
if (joinedIndex > left.childCount ||
joinedIndex === left.childCount && left.childCount > right.childCount) {
right.basicInsert(joinedIndex - left.childCount, value);
} else {
left.basicInsert(joinedIndex, value);
}
return;
}
}
// Cannot redistribute: split node
let childCount = node.childCount >> 1;
// Create a new node that will later become the right sibling of this node
let sibling = new Node(childCount);
// Move half of node node's data to it
sibling.moveFrom(node, 0, childCount, childCount);
// Insert the value in either the current node or the new one
if (index > node.childCount) {
sibling.basicInsert(index - node.childCount, value);
} else {
node.basicInsert(index, value);
}
// Is this the root?
if (!node.parent) {
// ...then first create a parent, which is the new root
this.root = new Node(2);
this.root.basicInsert(0, node);
}
// Prepare for inserting the sibling node into the tree
index = node.index() + 1;
node = node.parent;
value = sibling; // value is now a Node
}
node.basicInsert(index, value);
}
/* Below this point: these methods are optional */
* [Symbol.iterator]() { // Make tree iterable
let i = 0;
for (let node = this.first; node; node = node.next) {
for (let i = 0; i < node.childCount; i++) yield node.children[i];
}
}
print() {
console.log(this.root && this.root.toString());
}
verify() {
// Raise an error when the tree violates one of the required properties
if (!this.root) return; // An empty tree is fine.
if (this.root.parent) throw "root should not have a parent";
// Perform a breadth first traversal
let q = [this.root];
while (q.length) {
if (q[0].isLeaf() && this.first !== q[0]) throw "this.first is not pointing to first leaf";
let level = [];
let last = null;
for (let parent of q) {
if (!(parent instanceof Node)) throw "parent is not instance of Node";
if (parent.children.length > this.nodeCapacity) throw "node's children array is too large";
if (parent.childCount > 0 && parent.childCount * 2 <= parent.children.length) throw "node's fill ratio is too low";
for (let i = parent.childCount; i < parent.children.length; i++) {
if (parent.children[i] !== null) throw "child beyond childCount should be null but is not";
}
if (parent.isLeaf()) {
if (parent.children[0] !== parent.key) throw "key does not match with first child value";
for (let value of parent.children.slice(0, parent.childCount)) {
if (value === null) throw "leaf has a null as value";
if (value instanceof Node) throw "leaf has a Node as value";
}
} else {
if (parent.children[0].key !== parent.key) throw "key does not match with first child's key";
for (let node of parent.children.slice(0, parent.childCount)) {
if (node === null) throw "internal node has a null as value";
if (!(node instanceof Node)) throw "internal node has a non-Node as value";
if (node.parent !== parent) throw "wrong parent";
if (node.prev !== last) throw "prev link incorrect";
if (last && last.next !== node) throw "next link incorrect";
if (last && last.children.length + node.children.length <= this.nodeCapacity) {
throw "two consecutive siblings have a total number of children that is too small";
}
if (node.childCount * 2 < this.nodeCapacity) {
throw "internal node is too small: " + node;
}
level.push(node);
last = node;
}
}
}
if (last && last.next) throw "last node in level has a next reference";
q = level;
}
}
test(count=100, option=3) {
// option:
// 0 = always insert & delete at left side (offset 0)
// 1 = always insert & delete at right side
// 2 = always insert & delete at middle
// 3 = insert & delete at random offsets
// Create array to perform the same operations on it as on the tree
let max = count*2;
let arr = [];
// Perform a series of insertions
for (let i = 0; i < count; i++) {
// Choose random value
let key = Math.floor(Math.random() * max);
// Perform same insertion in array and tree
arr.push(key);
this.insert(key);
// Verify tree consistency and properties
this.verify();
// Verify the order of keys in the array is the same as in the tree
if (arr.sort((a,b) => a-b)+"" !== [...this]+"") throw i + ": tree not same as array";
}
// Perform a series of has-calls
for (let i = 0; i < count; i++) {
// Choose random update index
let key = Math.floor(Math.random() * max);
// Perform same insertion in array and tree
let has = arr.includes(key);
if (has !== this.has(key)) {
throw "has() returns inconsistent result";
}
if (!has) {
this.remove(key); // should not alter the tree
// Verify the order of keys in the array is the same as in the tree
if (arr+"" !== [...this]+"") throw i + ": tree not same as array";
}
}
// Perform a series of deletions
for (let i = arr.length - 1; i >= 0; i--) {
// Choose random deletion key
let key = arr[Math.floor(Math.random() * i)];
// Perform same deletion in array and tree
arr.splice(arr.indexOf(key), 1);
this.remove(key);
// Verify tree consistency and properties
this.verify();
// Verify the order of keys in the array is the same as in the tree
if (arr+"" !== [...this]+"") throw "tree not same as array";
}
}
}
// Perform 1000 insertions, 1000 updates, and 1000 deletions on a tree with node capacity of 8
new Tree(8).test(1000);
console.log("all tests completed");
支持的数据类型
键的数据类型可以是比较运算符适用的任何类型。所以对于字符串或数字它会起作用。在 JavaScript 中,它也适用于已实现 valueOf
或 toString
方法的 objects。例如,原生 Date
objects 已经是这种情况。 JavaScript 将首先尝试 valueOf
方法,如果没有,则尝试 toString
方法,该方法也在 Object 原型上定义。
注意事项
我在每个节点中存储了一个 key
,因为这是一个 in-memory 解决方案。从历史上看,B(+) 树用于磁盘存储,在这种情况下保持低块读取数很重要。标准的 B(+)-tree 实现将键存储在更高一级的 parent 中作为数组。这样搜索就不必加载每个 child 记录,同时搜索要选择的正确记录。
此外,它不需要存储第一个 child 的密钥,因为我们实际上只需要 分隔 两个连续 child 的密钥]仁.