如何在 JavaScript 中实现 "index join" 数据结构？

Question

我正在学习 join algorithms 有关关系数据查询处理的知识。简单的情况是嵌套循环连接：

function nestedJoin(R, S, compare) {
  const out = []
  
  for (const r of R) {
    for (const s of S) {
      if (compare(r, s)) {
        out.push([ r, s ])
      }
    }
  }

  return out
}

其中 compare 将比较连接属性。

我想知道的情况是索引连接。从作弊 sheet 复制到 JS 之类的，我们有：

function indexJoin(R, S) {
  const out = []
  
  for (const r of R) {
    const X = findInIndex(S.C, r.c)
    for (const s of X) {
      out.push([ r, s ])
    }
  }

  return out
}

但是那是什么 findInIndex(S.C, r.c)？传递给它的是什么 (S.C)？它是如何工作的？

join indices paper 是这样说的：

With no indices, two basic algorithms based on sorting [4] and hashing [5, lo] avoid the prohibitive cost of the nested loop method.

A join index is a binary relation. It only contains pairs of surrogates which makes ( it small. However, for generality, we assume that it does not always fit in RAM. Therefore, a join index must be clustered. Since we may need fast access to JI tuples via either r values or s values depending on whether there are selects on relations R or S, a JI should be clustered on (r, s). A simple and uniform solution is to maintain two copies of the JI, one clustered on r and the other clustered on s. Each copy is implemented by a W-tree an efficient variation of the versatile B-tree [l, 71.

那么，如果它是一个 B+树，那么在每个 B+树中将使用什么键和值，您将如何使用键和值（您以什么顺序插入键和获取值）？另外，如果“连接索引”在 JavaScript 中，就不能像这样实现吗？

const joinIndex = {}

function join(r, s) {
  const rest = joinIndex[r.c] = joinIndex[r.c] ?? {}
  rest[s.c] = true
}

function findInIndex(leftKey) {
  return Object.keys(joinIndex[leftKey])
}

请说明如何使用我的方法或 B+ 树方法实现连接算法。如果是B+tree方法，你不需要实现B+tree，只需要解释一下你将如何插入2个B+tree的东西in/out来更清楚地解释算法。

Answer 1

首先，论文中所说的join index，最好可以想象成table实现了两个之间的many-to-many关系table秒。此连接索引 table 中的一条记录由两个外键组成：一个引用 R table 中的主键，另一个引用 S table.[=16= 中的主键]

我没有得到作弊 sheet 中使用的 S.C 符号。但很明显，您需要以某种方式指定要使用的 which 连接索引，更具体地说，您要在其上使用哪个 B+Tree（集群）（以防其中两个已定义），最后，您要在其中找到哪个值（r.c，r 的键）。

B+树的作用是提供一个有序的散列table，即你可以高效地搜索一个键，并且可以很容易地从那个点按顺序走到后面的条目。在 join index 的这种特殊用途中，这允许您有效地找到给定 r[=35 的所有对 (r₁, s) =]1。通过从 B+ 树的根向下钻取到具有 r₁ 的第一个叶子，可以找到第一个。然后向前穿过B+树的底层会找到所有其他带有r₁的元组，直到遇到一个不再有这个r_1的元组值。

请注意，您仍然需要原始 table 的索引，以便找到给定键的完整记录。实际上，这也可以用 B+Tree 来完成，但在 JavaScript 中，一个简单的字典（普通对象）就足够了。

所以在 JavaScript 语法中我们可以想象这样的事情：

// Arguments:
//  - joinIndexTree: a B+Tree having (rKey, sKey) tuples, keyed and ordered by rKey.
//  - rKey: the key to search all matches for
function findInIndex(joinIndexTree, rKey) {
  let result = []; // This will collect all the sKey for
                   // which thee is a (rKey, sKey)
  // Find left-most location in B+Tree where rKey should occur (if present)
  let btreeCursor = joinIndexTree.find(rKey);
  if (btreeCursor.EOF()) return result; // At the far-right end of the B+Tree
  let tuple = btreeCursor.get();  // Read the tuple found at this location
  while (tuple[0] == rKey) { // First member of tuple matches rKey
     result.push(tuple[1]); // Collect the corresponding s-value
     btreeCursor.next(); // Walk to the next tuple
     if (btreeCursor.EOF()) break; // At the end of the B+Tree
     tuple = btreeCursor.get();  // Read the tuple found at this location
  }
  return result;
}

主程序是：

const joinIndexTree = ;// B+Tree with (rKey, sKey) pairs, ordered by rKey
const sIndex = Object.fromEntries(S.map(s => [s.id, s])); // dictionary

function indexJoin(joinIndexTree, R, S, sIndex) {
  const out = []
  
  for (const r of R) {
    const sids = findInIndex(joinIndexTree, r.id)
    for (const s_id of sids) {
      const s = sIndex[s_id]; // Look up by primary key index
      out.push([ r, s ])
    }
  }

  return out
}

当你只需要对table（查询）进行read-only操作时，你可以创建一个数组字典，而不是B+树，你可以在其中查找joinIndex[r.id] 并获得 s.id 值的数组。这当然很容易设置和使用，但是当 table 不是 read-only.

时保持更新很痛苦

作为 B+Tree 的替代，你也可以使用其他平衡搜索树，例如 AVL 和 red-black 树，但根据我的经验，B+Tree 具有更好的性能。

如何在 JavaScript 中实现 "index join" 数据结构？

How to implement an "index join" data structure in JavaScript?

javascript

database

indexing

b-tree

data-structures