Java TreeSet 奇怪的行为

Question

我正在尝试使用平衡 BST 而不是优先队列来实现 Prim 的最小生成树算法。我的实现在 Java 中。由于 Java 已经有 TreeSet 形式的红黑树库实现，我想使用它来代替我自己的自定义实现。

Prim 使用最小优先级队列的典型实现需要 O(ElogE)，我在这个实现背后的意图是将时间复杂度降低到 O(ElogV)。我相信这也可以使用索引优先级队列 (IPQ) 来完成，但我选择了 BST 版本，因为有可用的自平衡 BST 库实现（在 Java 和 C++ 中）。

对于我测试过此实现的示例，它似乎工作正常，并且产生了正确的结果。但是当我进行更深入的检查以确保 TC 实际上是 O(ElogV) 时，我发现 Java TreeSet 出于某种原因对我来说表现得很奇怪。

这是我的实现：

    package graph;

    import java.util.Comparator;
    import java.util.TreeSet;

    /**
     * Implementation of Prim's algorithm (eager version) to
     * find Minimum Spanning Tree using a self-balancing BST
     * Time Complexity: O(ElogV)
     * 
     * This implementation uses a self-balancing BST (specifically Red-Black Tree)
     * 
     * We can do eager Prim's implementation using an Indexed Priority Queue (IPQ) as well
     * 
     * Comparison: IPQ vs BST
     * To get next best edge in IPQ, we pop the min element from root, and 
     * then heapify the tree, which overall takes O(lgn). To get next best edge in 
     * BST, it would take O(lgn) as well, and then we’ll have to delete that entry 
     * which would take another O(lgn), but overall it is still O(lgn)
     * 
     * Insertion into both BST and IPQ takes O(lgn) anyway
     * 
     * Update in IPQ takes O(lgn). Update in BST as well can be done in 
     * O(lgn) [search the element in O(lgn) then delete that entry in another 
     * O(lgn) and then insert new entry with updated edge weight (and source node) 
     * in yet another O(lgn). Intotal, it takes 3*logn but overall still O(lgn)]
     *
     */
    public class PrimsMstUsingSelfBalancingBST extends Graph {

        private int n, m, edgeCount;
        private boolean[] visited;
        private Edge[] mst;
        private double cost;
        private TreeSet<Edge> sortedSet;

        public PrimsMstUsingSelfBalancingBST(int numberOfVertices) {
            super(numberOfVertices);
            n = numberOfVertices;
        }

        public Double mst(int s) {
            m = n - 1; // number of expected edges in mst
            edgeCount = 0;
            mst = new Edge[m];
            visited = new boolean[n];
            sortedSet = new TreeSet<>(getComparator());

            relaxEdgesAtNode(s);

            while (!sortedSet.isEmpty() && edgeCount != m) {
                System.out.println(sortedSet);
                // pollFirst() retrieves and removes smallest element from TreeSet
                Edge edge = sortedSet.pollFirst();
                int nodeIndex = edge.to;

                // skip edges pointing to already visited nodes
                if (visited[nodeIndex]) continue;

                mst[edgeCount] = edge;
                edgeCount++;
                cost += edge.wt;

                relaxEdgesAtNode(nodeIndex);
            }

            return (edgeCount == m) ? cost : null;
        }

        private void relaxEdgesAtNode(int index) {
            visited[index] = true;

            for (Edge edge : adjList.get(index))  {
                int to = edge.to;

                if (visited[to]) continue;

                if (!sortedSet.contains(edge)) {
                    sortedSet.add(edge);
                }
                else {
                    Edge existingEdge = search(edge);
                    if (existingEdge.wt > edge.wt) {
                        sortedSet.remove(existingEdge);
                        sortedSet.add(edge);
                    }
                }
            }
        }

        private Comparator<Edge> getComparator() {
            return new Comparator<Edge>() {
                @Override
                public int compare(Edge e1, Edge e2) {
                    // Java TreeSet is implemented in a way that it uses 
                    // Comparable/Comparator logics for all comparisons.

                    // i.e., it will use this comparator to do comparison 
                    // in contains() method.

                    // It will use this same comparator to do comparison 
                    // during remove() method.

                    // It will also use this same comparator, to perform 
                    // sorting during add() method.

                    // While looking up an edge from contains() or remove(), 
                    // we need to perform check based on destinationNodeIndex,

                    // But to keep elements in sorted order during add() operation
                    // we need to compare elements based on their edge weights

                    // For correct behavior of contains() and remove()
                    if (e1.to == e2.to) return 0;

                    // For correct sorting behavior
                    if (e1.wt > e2.wt) return 1;
                    else if (e1.wt < e2.wt) return -1;

                    // Return -1 or 1 to make sure that different edges with equal 
                    // weights are not ignored by TreeSet.add()
                    // this check can be included in either 'if' or 'else' part 
                    // above. Keeping this separate for readability.
                    return -1;
                }
            };
        }

        // O(log n) search in TreeSet
        private Edge search(Edge e) {
            Edge ceil  = sortedSet.ceiling(e); // smallest element >= e
            Edge floor = sortedSet.floor(e);   // largest element <= e

            return ceil.equals(floor) ? ceil : null; 
        }

        public static void main(String[] args) {
            example1();
        }

        private static void example1() {
            int n = 8;
            PrimsMstUsingSelfBalancingBST graph = 
                    new PrimsMstUsingSelfBalancingBST(n);

            graph.addEdge(0, 1, true, 10);
            graph.addEdge(0, 2, true, 1);
            graph.addEdge(0, 3, true, 4);
            graph.addEdge(2, 1, true, 3);
            graph.addEdge(2, 5, true, 8);
            graph.addEdge(2, 3, true, 2);
            graph.addEdge(3, 5, true, 2);
            graph.addEdge(3, 6, true, 7);
            graph.addEdge(5, 4, true, 1);
            graph.addEdge(5, 7, true, 9);
            graph.addEdge(5, 6, true, 6);
            graph.addEdge(4, 1, true, 0);
            graph.addEdge(4, 7, true, 8);
            graph.addEdge(6, 7, true, 12);

            int s = 0;
            Double cost = graph.mst(s);
            if (cost != null) {
                System.out.println(cost); // 20.0
                for (Edge e : graph.mst)
                    System.out.println(String.format("Used edge (%d, %d) with cost: %.2f", e.from, e.to, e.wt));
                /*
                 * Used edge (0, 2) with cost: 1.00
                 * Used edge (2, 3) with cost: 2.00
                 * Used edge (3, 5) with cost: 2.00
                 * Used edge (5, 4) with cost: 1.00
                 * Used edge (4, 1) with cost: 0.00
                 * Used edge (5, 6) with cost: 6.00
                 * Used edge (4, 7) with cost: 8.00
                 */
            }
            else {
                System.out.println("MST not found!");
            }
        }
    }

下面是我正在测试的无向加权图（代码中也使用了相同的示例）

我面临的问题是 TreeSet 似乎在其上添加重复的条目作为 contains() 方法有时 returns false 即使对应的条目具有相同的键（此中边的目标节点）案例）已经存在。

下面是上述程序的输出：

[{from=0, to=2, weight=1.00}, {from=0, to=3, weight=4.00}, {from=0, to=1, weight=10.00}]
[{from=2, to=3, weight=2.00}, {from=2, to=1, weight=3.00}, {from=2, to=5, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=3, to=5, weight=2.00}, {from=2, to=1, weight=3.00}, {from=3, to=6, weight=7.00}, {from=0, to=1, weight=10.00}]
[{from=5, to=4, weight=1.00}, {from=2, to=1, weight=3.00}, {from=5, to=6, weight=6.00}, {from=5, to=7, weight=9.00}, {from=0, to=1, weight=10.00}]
[{from=4, to=1, weight=0.00}, {from=5, to=6, weight=6.00}, {from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=5, to=6, weight=6.00}, {from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
20.0
Used edge (0, 2) with cost: 1.00
Used edge (2, 3) with cost: 2.00
Used edge (3, 5) with cost: 2.00
Used edge (5, 4) with cost: 1.00
Used edge (4, 1) with cost: 0.00
Used edge (5, 6) with cost: 6.00
Used edge (4, 7) with cost: 8.00

可以清楚地看到，即使目标节点 1 已经有一个条目 {from=0, to=1, weight=10.00}) [输出的第 1 行]，它也会为其添加另一个条目作为 {from=2, to=1, weight=3.00} [输出的第 2 行] 而不是更新现有条目。

当我通过在我的自定义比较器中添加一个断点来调试它时，我注意到从未为现有条目调用比较器，因此没有发生与现有条目的比较。例如在这种情况下，在处理边缘 {from=2, to=1, weight=3.00} Comparator.compare() 时，会调用条目 {from=2, to=3, weight=2.00} 和 {from= 2, to=5, weight=8.00} 但没有为条目 {from=0, to=1, weight=10.00} 调用，因此它得出结论没有目标节点 1 的条目，因此它添加一个新条目，因此我得到了目标节点 1 [输出的第 2 行]

的两个条目

我怀疑这与 Java 集合框架中对象的不变性和并发修改限制有关。但是我无法理解问题的根本原因。

感谢任何帮助。

Answer 1

您Comparator违反了合同，例如

The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)

这是compare方法，没有全部注释：

public int compare(Edge e1, Edge e2) {
    if (e1.to == e2.to) return 0;

    if (e1.wt > e2.wt) return 1;
    else if (e1.wt < e2.wt) return -1;

    return -1;
}

例如你有两条权重为 1 的边：

a = {from=0, to=2, weight=1.00}
b = {from=5, to=4, weight=1.00}

由于它们具有不同的 to 值，但具有相同的 wt 值，因此方法 returns -1 与参数顺序无关，即 compare(a, b) = -1和 compare(b, a) = -1.

这违反了上面列出的规则，将导致不可预知的行为。

Java TreeSet 奇怪的行为

Java TreeSet Weird Behavior

java

minimum-spanning-tree

treeset

prims-algorithm