Java TreeSet 奇怪的行为
Java TreeSet Weird Behavior
我正在尝试使用平衡 BST 而不是优先队列来实现 Prim 的最小生成树算法。我的实现在 Java 中。由于 Java 已经有 TreeSet 形式的红黑树库实现,我想使用它来代替我自己的自定义实现。
Prim 使用最小优先级队列的典型实现需要 O(ElogE),我在这个实现背后的意图是将时间复杂度降低到 O(ElogV)。我相信这也可以使用索引优先级队列 (IPQ) 来完成,但我选择了 BST 版本,因为有可用的自平衡 BST 库实现(在 Java 和 C++ 中)。
对于我测试过此实现的示例,它似乎工作正常,并且产生了正确的结果。但是当我进行更深入的检查以确保 TC 实际上是 O(ElogV) 时,我发现 Java TreeSet 出于某种原因对我来说表现得很奇怪。
这是我的实现:
package graph;
import java.util.Comparator;
import java.util.TreeSet;
/**
* Implementation of Prim's algorithm (eager version) to
* find Minimum Spanning Tree using a self-balancing BST
* Time Complexity: O(ElogV)
*
* This implementation uses a self-balancing BST (specifically Red-Black Tree)
*
* We can do eager Prim's implementation using an Indexed Priority Queue (IPQ) as well
*
* Comparison: IPQ vs BST
* To get next best edge in IPQ, we pop the min element from root, and
* then heapify the tree, which overall takes O(lgn). To get next best edge in
* BST, it would take O(lgn) as well, and then we’ll have to delete that entry
* which would take another O(lgn), but overall it is still O(lgn)
*
* Insertion into both BST and IPQ takes O(lgn) anyway
*
* Update in IPQ takes O(lgn). Update in BST as well can be done in
* O(lgn) [search the element in O(lgn) then delete that entry in another
* O(lgn) and then insert new entry with updated edge weight (and source node)
* in yet another O(lgn). Intotal, it takes 3*logn but overall still O(lgn)]
*
*/
public class PrimsMstUsingSelfBalancingBST extends Graph {
private int n, m, edgeCount;
private boolean[] visited;
private Edge[] mst;
private double cost;
private TreeSet<Edge> sortedSet;
public PrimsMstUsingSelfBalancingBST(int numberOfVertices) {
super(numberOfVertices);
n = numberOfVertices;
}
public Double mst(int s) {
m = n - 1; // number of expected edges in mst
edgeCount = 0;
mst = new Edge[m];
visited = new boolean[n];
sortedSet = new TreeSet<>(getComparator());
relaxEdgesAtNode(s);
while (!sortedSet.isEmpty() && edgeCount != m) {
System.out.println(sortedSet);
// pollFirst() retrieves and removes smallest element from TreeSet
Edge edge = sortedSet.pollFirst();
int nodeIndex = edge.to;
// skip edges pointing to already visited nodes
if (visited[nodeIndex]) continue;
mst[edgeCount] = edge;
edgeCount++;
cost += edge.wt;
relaxEdgesAtNode(nodeIndex);
}
return (edgeCount == m) ? cost : null;
}
private void relaxEdgesAtNode(int index) {
visited[index] = true;
for (Edge edge : adjList.get(index)) {
int to = edge.to;
if (visited[to]) continue;
if (!sortedSet.contains(edge)) {
sortedSet.add(edge);
}
else {
Edge existingEdge = search(edge);
if (existingEdge.wt > edge.wt) {
sortedSet.remove(existingEdge);
sortedSet.add(edge);
}
}
}
}
private Comparator<Edge> getComparator() {
return new Comparator<Edge>() {
@Override
public int compare(Edge e1, Edge e2) {
// Java TreeSet is implemented in a way that it uses
// Comparable/Comparator logics for all comparisons.
// i.e., it will use this comparator to do comparison
// in contains() method.
// It will use this same comparator to do comparison
// during remove() method.
// It will also use this same comparator, to perform
// sorting during add() method.
// While looking up an edge from contains() or remove(),
// we need to perform check based on destinationNodeIndex,
// But to keep elements in sorted order during add() operation
// we need to compare elements based on their edge weights
// For correct behavior of contains() and remove()
if (e1.to == e2.to) return 0;
// For correct sorting behavior
if (e1.wt > e2.wt) return 1;
else if (e1.wt < e2.wt) return -1;
// Return -1 or 1 to make sure that different edges with equal
// weights are not ignored by TreeSet.add()
// this check can be included in either 'if' or 'else' part
// above. Keeping this separate for readability.
return -1;
}
};
}
// O(log n) search in TreeSet
private Edge search(Edge e) {
Edge ceil = sortedSet.ceiling(e); // smallest element >= e
Edge floor = sortedSet.floor(e); // largest element <= e
return ceil.equals(floor) ? ceil : null;
}
public static void main(String[] args) {
example1();
}
private static void example1() {
int n = 8;
PrimsMstUsingSelfBalancingBST graph =
new PrimsMstUsingSelfBalancingBST(n);
graph.addEdge(0, 1, true, 10);
graph.addEdge(0, 2, true, 1);
graph.addEdge(0, 3, true, 4);
graph.addEdge(2, 1, true, 3);
graph.addEdge(2, 5, true, 8);
graph.addEdge(2, 3, true, 2);
graph.addEdge(3, 5, true, 2);
graph.addEdge(3, 6, true, 7);
graph.addEdge(5, 4, true, 1);
graph.addEdge(5, 7, true, 9);
graph.addEdge(5, 6, true, 6);
graph.addEdge(4, 1, true, 0);
graph.addEdge(4, 7, true, 8);
graph.addEdge(6, 7, true, 12);
int s = 0;
Double cost = graph.mst(s);
if (cost != null) {
System.out.println(cost); // 20.0
for (Edge e : graph.mst)
System.out.println(String.format("Used edge (%d, %d) with cost: %.2f", e.from, e.to, e.wt));
/*
* Used edge (0, 2) with cost: 1.00
* Used edge (2, 3) with cost: 2.00
* Used edge (3, 5) with cost: 2.00
* Used edge (5, 4) with cost: 1.00
* Used edge (4, 1) with cost: 0.00
* Used edge (5, 6) with cost: 6.00
* Used edge (4, 7) with cost: 8.00
*/
}
else {
System.out.println("MST not found!");
}
}
}
下面是我正在测试的无向加权图(代码中也使用了相同的示例)
我面临的问题是 TreeSet 似乎在其上添加重复的条目作为 contains() 方法有时 returns false 即使对应的条目具有相同的键(此中边的目标节点)案例)已经存在。
下面是上述程序的输出:
[{from=0, to=2, weight=1.00}, {from=0, to=3, weight=4.00}, {from=0, to=1, weight=10.00}]
[{from=2, to=3, weight=2.00}, {from=2, to=1, weight=3.00}, {from=2, to=5, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=3, to=5, weight=2.00}, {from=2, to=1, weight=3.00}, {from=3, to=6, weight=7.00}, {from=0, to=1, weight=10.00}]
[{from=5, to=4, weight=1.00}, {from=2, to=1, weight=3.00}, {from=5, to=6, weight=6.00}, {from=5, to=7, weight=9.00}, {from=0, to=1, weight=10.00}]
[{from=4, to=1, weight=0.00}, {from=5, to=6, weight=6.00}, {from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=5, to=6, weight=6.00}, {from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
20.0
Used edge (0, 2) with cost: 1.00
Used edge (2, 3) with cost: 2.00
Used edge (3, 5) with cost: 2.00
Used edge (5, 4) with cost: 1.00
Used edge (4, 1) with cost: 0.00
Used edge (5, 6) with cost: 6.00
Used edge (4, 7) with cost: 8.00
可以清楚地看到,即使目标节点 1 已经有一个条目 {from=0, to=1, weight=10.00}) [输出的第 1 行],它也会为其添加另一个条目作为 {from=2, to=1, weight=3.00} [输出的第 2 行] 而不是更新现有条目。
当我通过在我的自定义比较器中添加一个断点来调试它时,我注意到从未为现有条目调用比较器,因此没有发生与现有条目的比较。例如在这种情况下,在处理边缘 {from=2, to=1, weight=3.00} Comparator.compare() 时,会调用条目 {from=2, to=3, weight=2.00} 和 {from= 2, to=5, weight=8.00} 但没有为条目 {from=0, to=1, weight=10.00} 调用,因此它得出结论没有目标节点 1 的条目,因此它添加一个新条目,因此我得到了目标节点 1 [输出的第 2 行]
的两个条目
我怀疑这与 Java 集合框架中对象的不变性和并发修改限制有关。但是我无法理解问题的根本原因。
感谢任何帮助。
您Comparator
违反了合同,例如
The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x))
for all x
and y
. (This implies that compare(x, y)
must throw an exception if and only if compare(y, x)
throws an exception.)
这是compare
方法,没有全部注释:
public int compare(Edge e1, Edge e2) {
if (e1.to == e2.to) return 0;
if (e1.wt > e2.wt) return 1;
else if (e1.wt < e2.wt) return -1;
return -1;
}
例如你有两条权重为 1 的边:
a = {from=0, to=2, weight=1.00}
b = {from=5, to=4, weight=1.00}
由于它们具有不同的 to
值,但具有相同的 wt
值,因此方法 returns -1
与参数顺序无关,即 compare(a, b) = -1
和 compare(b, a) = -1
.
这违反了上面列出的规则,将导致不可预知的行为。
我正在尝试使用平衡 BST 而不是优先队列来实现 Prim 的最小生成树算法。我的实现在 Java 中。由于 Java 已经有 TreeSet 形式的红黑树库实现,我想使用它来代替我自己的自定义实现。
Prim 使用最小优先级队列的典型实现需要 O(ElogE),我在这个实现背后的意图是将时间复杂度降低到 O(ElogV)。我相信这也可以使用索引优先级队列 (IPQ) 来完成,但我选择了 BST 版本,因为有可用的自平衡 BST 库实现(在 Java 和 C++ 中)。
对于我测试过此实现的示例,它似乎工作正常,并且产生了正确的结果。但是当我进行更深入的检查以确保 TC 实际上是 O(ElogV) 时,我发现 Java TreeSet 出于某种原因对我来说表现得很奇怪。
这是我的实现:
package graph;
import java.util.Comparator;
import java.util.TreeSet;
/**
* Implementation of Prim's algorithm (eager version) to
* find Minimum Spanning Tree using a self-balancing BST
* Time Complexity: O(ElogV)
*
* This implementation uses a self-balancing BST (specifically Red-Black Tree)
*
* We can do eager Prim's implementation using an Indexed Priority Queue (IPQ) as well
*
* Comparison: IPQ vs BST
* To get next best edge in IPQ, we pop the min element from root, and
* then heapify the tree, which overall takes O(lgn). To get next best edge in
* BST, it would take O(lgn) as well, and then we’ll have to delete that entry
* which would take another O(lgn), but overall it is still O(lgn)
*
* Insertion into both BST and IPQ takes O(lgn) anyway
*
* Update in IPQ takes O(lgn). Update in BST as well can be done in
* O(lgn) [search the element in O(lgn) then delete that entry in another
* O(lgn) and then insert new entry with updated edge weight (and source node)
* in yet another O(lgn). Intotal, it takes 3*logn but overall still O(lgn)]
*
*/
public class PrimsMstUsingSelfBalancingBST extends Graph {
private int n, m, edgeCount;
private boolean[] visited;
private Edge[] mst;
private double cost;
private TreeSet<Edge> sortedSet;
public PrimsMstUsingSelfBalancingBST(int numberOfVertices) {
super(numberOfVertices);
n = numberOfVertices;
}
public Double mst(int s) {
m = n - 1; // number of expected edges in mst
edgeCount = 0;
mst = new Edge[m];
visited = new boolean[n];
sortedSet = new TreeSet<>(getComparator());
relaxEdgesAtNode(s);
while (!sortedSet.isEmpty() && edgeCount != m) {
System.out.println(sortedSet);
// pollFirst() retrieves and removes smallest element from TreeSet
Edge edge = sortedSet.pollFirst();
int nodeIndex = edge.to;
// skip edges pointing to already visited nodes
if (visited[nodeIndex]) continue;
mst[edgeCount] = edge;
edgeCount++;
cost += edge.wt;
relaxEdgesAtNode(nodeIndex);
}
return (edgeCount == m) ? cost : null;
}
private void relaxEdgesAtNode(int index) {
visited[index] = true;
for (Edge edge : adjList.get(index)) {
int to = edge.to;
if (visited[to]) continue;
if (!sortedSet.contains(edge)) {
sortedSet.add(edge);
}
else {
Edge existingEdge = search(edge);
if (existingEdge.wt > edge.wt) {
sortedSet.remove(existingEdge);
sortedSet.add(edge);
}
}
}
}
private Comparator<Edge> getComparator() {
return new Comparator<Edge>() {
@Override
public int compare(Edge e1, Edge e2) {
// Java TreeSet is implemented in a way that it uses
// Comparable/Comparator logics for all comparisons.
// i.e., it will use this comparator to do comparison
// in contains() method.
// It will use this same comparator to do comparison
// during remove() method.
// It will also use this same comparator, to perform
// sorting during add() method.
// While looking up an edge from contains() or remove(),
// we need to perform check based on destinationNodeIndex,
// But to keep elements in sorted order during add() operation
// we need to compare elements based on their edge weights
// For correct behavior of contains() and remove()
if (e1.to == e2.to) return 0;
// For correct sorting behavior
if (e1.wt > e2.wt) return 1;
else if (e1.wt < e2.wt) return -1;
// Return -1 or 1 to make sure that different edges with equal
// weights are not ignored by TreeSet.add()
// this check can be included in either 'if' or 'else' part
// above. Keeping this separate for readability.
return -1;
}
};
}
// O(log n) search in TreeSet
private Edge search(Edge e) {
Edge ceil = sortedSet.ceiling(e); // smallest element >= e
Edge floor = sortedSet.floor(e); // largest element <= e
return ceil.equals(floor) ? ceil : null;
}
public static void main(String[] args) {
example1();
}
private static void example1() {
int n = 8;
PrimsMstUsingSelfBalancingBST graph =
new PrimsMstUsingSelfBalancingBST(n);
graph.addEdge(0, 1, true, 10);
graph.addEdge(0, 2, true, 1);
graph.addEdge(0, 3, true, 4);
graph.addEdge(2, 1, true, 3);
graph.addEdge(2, 5, true, 8);
graph.addEdge(2, 3, true, 2);
graph.addEdge(3, 5, true, 2);
graph.addEdge(3, 6, true, 7);
graph.addEdge(5, 4, true, 1);
graph.addEdge(5, 7, true, 9);
graph.addEdge(5, 6, true, 6);
graph.addEdge(4, 1, true, 0);
graph.addEdge(4, 7, true, 8);
graph.addEdge(6, 7, true, 12);
int s = 0;
Double cost = graph.mst(s);
if (cost != null) {
System.out.println(cost); // 20.0
for (Edge e : graph.mst)
System.out.println(String.format("Used edge (%d, %d) with cost: %.2f", e.from, e.to, e.wt));
/*
* Used edge (0, 2) with cost: 1.00
* Used edge (2, 3) with cost: 2.00
* Used edge (3, 5) with cost: 2.00
* Used edge (5, 4) with cost: 1.00
* Used edge (4, 1) with cost: 0.00
* Used edge (5, 6) with cost: 6.00
* Used edge (4, 7) with cost: 8.00
*/
}
else {
System.out.println("MST not found!");
}
}
}
下面是我正在测试的无向加权图(代码中也使用了相同的示例)
我面临的问题是 TreeSet 似乎在其上添加重复的条目作为 contains() 方法有时 returns false 即使对应的条目具有相同的键(此中边的目标节点)案例)已经存在。
下面是上述程序的输出:
[{from=0, to=2, weight=1.00}, {from=0, to=3, weight=4.00}, {from=0, to=1, weight=10.00}]
[{from=2, to=3, weight=2.00}, {from=2, to=1, weight=3.00}, {from=2, to=5, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=3, to=5, weight=2.00}, {from=2, to=1, weight=3.00}, {from=3, to=6, weight=7.00}, {from=0, to=1, weight=10.00}]
[{from=5, to=4, weight=1.00}, {from=2, to=1, weight=3.00}, {from=5, to=6, weight=6.00}, {from=5, to=7, weight=9.00}, {from=0, to=1, weight=10.00}]
[{from=4, to=1, weight=0.00}, {from=5, to=6, weight=6.00}, {from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=5, to=6, weight=6.00}, {from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
[{from=4, to=7, weight=8.00}, {from=0, to=1, weight=10.00}]
20.0
Used edge (0, 2) with cost: 1.00
Used edge (2, 3) with cost: 2.00
Used edge (3, 5) with cost: 2.00
Used edge (5, 4) with cost: 1.00
Used edge (4, 1) with cost: 0.00
Used edge (5, 6) with cost: 6.00
Used edge (4, 7) with cost: 8.00
可以清楚地看到,即使目标节点 1 已经有一个条目 {from=0, to=1, weight=10.00}) [输出的第 1 行],它也会为其添加另一个条目作为 {from=2, to=1, weight=3.00} [输出的第 2 行] 而不是更新现有条目。
当我通过在我的自定义比较器中添加一个断点来调试它时,我注意到从未为现有条目调用比较器,因此没有发生与现有条目的比较。例如在这种情况下,在处理边缘 {from=2, to=1, weight=3.00} Comparator.compare() 时,会调用条目 {from=2, to=3, weight=2.00} 和 {from= 2, to=5, weight=8.00} 但没有为条目 {from=0, to=1, weight=10.00} 调用,因此它得出结论没有目标节点 1 的条目,因此它添加一个新条目,因此我得到了目标节点 1 [输出的第 2 行]
的两个条目我怀疑这与 Java 集合框架中对象的不变性和并发修改限制有关。但是我无法理解问题的根本原因。
感谢任何帮助。
您Comparator
违反了合同,例如
The implementor must ensure that
sgn(compare(x, y)) == -sgn(compare(y, x))
for allx
andy
. (This implies thatcompare(x, y)
must throw an exception if and only ifcompare(y, x)
throws an exception.)
这是compare
方法,没有全部注释:
public int compare(Edge e1, Edge e2) {
if (e1.to == e2.to) return 0;
if (e1.wt > e2.wt) return 1;
else if (e1.wt < e2.wt) return -1;
return -1;
}
例如你有两条权重为 1 的边:
a = {from=0, to=2, weight=1.00}
b = {from=5, to=4, weight=1.00}
由于它们具有不同的 to
值,但具有相同的 wt
值,因此方法 returns -1
与参数顺序无关,即 compare(a, b) = -1
和 compare(b, a) = -1
.
这违反了上面列出的规则,将导致不可预知的行为。