如何在不到 O(N*M) 的时间内加入两个列表？

Question

假设我们有两个 table（如 SQL table），其中一个的主键是另一个的外键。我应该编写一个简单的算法来模拟这两个 table 的连接。我想遍历第一个 table 中主键列中的每个元素，在第二个循环中检查外键是否匹配，然后将其存储在外部数组或列表中。但是，这需要 O(N*M)，我需要找到更好的东西。教科书中有暗示它涉及散列，但是，我不确定这里如何实现散列或如何使它变得更好？

正在编辑以添加示例：

Table Employees
|ID (Primary Key)| Name | Department|
|:---------------|:----:|----------:|
|       1        | Jack |    IT     |   
|       2        | Amy  |  Finance  |

Table Transactions
|Sold Product| Sold By(Foreign Key)| Sold To|
|:-----------|:-------------------:|-------:|
|    TV      |          1          |  Mary  |
|   Radio    |          1          |  Bob   |
|   Mobile   |          2          |  Lisa  |

我想做的是编写一个算法，使用哈希将这两个 table 连接起来，而不是任何复杂的东西，只是一个关于如何做到这一点的简单想法。

Answer 1

将child table的主键和外键读取到一个映射中，其中键是外键，值是主键。请记住，如果这是一对多关系，一个外键可以映射到多个主键。

现在遍历母亲的主键 table 并为每个主键检查它是否存在于映射中。如果是这样，您添加一个与数组有关系的行的主键元组（或者您想要保存它）。

时间复杂度为O(n + m)。遍历每个 table 的行一次。由于映射中的查找是常量，我们不需要添加它。

Space 复杂度为 O(m)，其中 m 是 child table 中的行数。与原始解决方案相比，这是您使用的一些额外 space 来提高时间复杂度。

Answer 2

使用主要 table 元素填充 HashMap，使用它们的键作为映射键，使用整个对象作为值。这只需要在主 table 上传递 1 次，并且添加到 HashMap 的每个操作都在恒定时间 O(1) 内发生。这类似于创建数据库索引。

迭代子项 table，通过从地图中获取父项，在恒定时间 O(1) 内“加入”到父行。

每个“table”的总运行时间为 1 次，因此 O(n + m)。

代码可能类似于：

class Parent {
    int id;
    // other fields, getters etc
}

class Child {
    int parentId;
    // other fields, getters etc
}

void join(List<Parent> parents, List<Child> children) {
    Map<Integer, Parent> parentMap = parents.stream()
      .collect(toMap(Parent::getKey, p -> p)); // FYI toMap collects to a HashMap
    for (Child child : children) {
        Parent parent = parentMap.get(child.getParentId());
        // not sure what “join” means, but you now have child and its parent
    }
}

Answer 3

试试这个。

record Employees(int id, String name, String department) {}
record Transactions(String soldProduct, int soldBy, String soldTo) {}
record Joined(String soldProduct, int soldBy, String soldTo, String name, String department) {}

public static void main(String[] args) {
    List<Employees> employees = List.of(
        new Employees(1, "Jack", "IT"),
        new Employees(2, "Amy", "Finance"));
    List<Transactions> transactions = List.of(
        new Transactions("TV", 1, "Mary"),
        new Transactions("Radio", 1, "Bob"),
        new Transactions("Mobile", 2, "Lisa"));

    Map<Integer, Employees> mapEmployee = employees.stream()
        .collect(Collectors.toMap(Employees::id, Function.identity()));
    List<Joined> joined = transactions.stream()
        .map(t -> {
            Employees e = mapEmployee.get(t.soldBy());
            return new Joined(t.soldProduct(), t.soldBy(), t.soldTo(), e.name(), e.department());
        })
        .toList();

    for (Joined e : joined)
        System.out.println(e);
}

输出：

Joined[soldProduct=TV, soldBy=1, soldTo=Mary, name=Jack, department=IT]
Joined[soldProduct=Radio, soldBy=1, soldTo=Bob, name=Jack, department=IT]
Joined[soldProduct=Mobile, soldBy=2, soldTo=Lisa, name=Amy, department=Finance]

如何在不到 O(N*M) 的时间内加入两个列表？

How can I join two lists in less than O(N*M)?

java

algorithm

hash