Java 在本地资源同步期间死锁?
Java Deadlock during a synchronized on a local resource?
我发现多个线程在同一行代码上死锁的问题。
我无法在本地或任何测试中重现该问题,但来自生产环境的线程转储已经非常清楚地显示了该问题。
我不明白为什么线程会在下面的同步行上被阻塞,因为在调用堆栈或任何其他线程中的对象上没有其他同步。有谁知道发生了什么,或者我什至如何重现这个问题(目前正在尝试使用 15 个线程循环访问 trim(),同时通过我的队列处理 2000 个任务 - 但无法重现)
在下面的线程转储中,我认为具有 'locked' 状态的多个线程可能是 Java 错误的表现:http://bugs.java.com/view_bug.do?bug_id=8047816 JStack 报告线程处于错误状态。
(我使用的是 JDK 版本:1.7.0_51)
干杯!
这是线程转储中线程的视图.....
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
这里是提取出来的Java代码,显示错误的地方...
public class Deadlock {
final Deque<Object> delegate = new ArrayDeque<>();
final long maxSize = Long.MAX_VALUE;
private final AtomicLong totalExec = new AtomicLong();
private final Map<Object, AtomicLong> totals = new HashMap<>();
private final Map<Object, Deque<Long>> execTimes = new HashMap<>();
public void trim() {
//Possible optimization is evicting in chunks, segmenting by arrival time
while (this.totalExec.longValue() > this.maxSize) {
final Object t = this.delegate.peek();
final Deque<Long> execTime = this.execTimes.get(t);
final Long exec = execTime.peek();
if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
//If Job Started Inside of Window, remove and re-loop
remove();
}
else {
//Otherwise exit the loop
break;
}
}
}
public Object remove() {
Object removed;
synchronized (this.delegate) { //4 Threads deadlocking on this line !
removed = this.delegate.pollFirst();
}
if (removed != null) {
itemRemoved(removed);
}
return removed;
}
public void itemRemoved(final Object t) {
//Decrement Total & Queue
final AtomicLong catTotal = this.totals.get(t);
if (catTotal != null) {
if (!this.execTimes.get(t).isEmpty()) {
final Long exec = this.execTimes.get(t).pollFirst();
if (exec != null) {
catTotal.addAndGet(-exec);
this.totalExec.addAndGet(-exec);
}
}
}
}
}
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
(强调他们的)
您正在以不同步的方式读写 to/from Map
。
我认为没有理由假定您的代码是线程安全的。
我建议你在 trim
中有一个无限循环,这是由于缺乏线程安全性造成的。
进入同步块相对较慢,因此线程转储很可能总是会显示至少有几个线程在等待获取锁。
您的第一个线程在等待 pollFirst
时持有锁。
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
其他线程正在等待获取锁。
您将需要提供整个线程转储以确定哪个线程持有 0x0000000052ec4000
上的锁,这是阻止您的 pollFirst
调用返回的原因。
为了死锁,您至少需要两个同时锁定同一线程中的至少两个对象,而您发布的代码似乎没有做到这一点。您指出的错误可能适用,但正如我所读,这是一个表面问题,线程不是 'locked',而是在等待获取相关对象(ArrayDeque)的锁。如果出现死锁,您应该会在日志中看到 "deadlock" 消息。它会调出互相阻塞的两个线程。
我不相信线程转储说存在死锁。它只是告诉您在进行转储时有多少线程在监视器上等待。由于在给定时刻只有一个线程可能拥有监视器,所以这并不奇怪。
您在应用程序中看到什么行为让您认为自己遇到了死锁?您的代码中缺少很多东西,特别是委托 Dequeue 中的对象来自哪里。我的猜测是您没有完全死锁,而是其他一些看起来像死锁的问题。
感谢这里的回复,很明显问题是 none 线程安全使用多个 Collections。
为了解决这个问题,我已经使 trim 方法同步,并将 HashMap 的使用替换为 ConcurrentHashMap,将 ArrayDeque 的使用替换为 LinkedBlockingDeque
(并发 Collections FTW!)
进一步计划的改进是将 2 个单独的地图的使用更改为包含自定义 Object 的单个地图,这样可以保持操作(在 itemRemoved 中)原子性。
我发现多个线程在同一行代码上死锁的问题。 我无法在本地或任何测试中重现该问题,但来自生产环境的线程转储已经非常清楚地显示了该问题。
我不明白为什么线程会在下面的同步行上被阻塞,因为在调用堆栈或任何其他线程中的对象上没有其他同步。有谁知道发生了什么,或者我什至如何重现这个问题(目前正在尝试使用 15 个线程循环访问 trim(),同时通过我的队列处理 2000 个任务 - 但无法重现)
在下面的线程转储中,我认为具有 'locked' 状态的多个线程可能是 Java 错误的表现:http://bugs.java.com/view_bug.do?bug_id=8047816 JStack 报告线程处于错误状态。 (我使用的是 JDK 版本:1.7.0_51)
干杯!
这是线程转储中线程的视图.....
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
这里是提取出来的Java代码,显示错误的地方...
public class Deadlock {
final Deque<Object> delegate = new ArrayDeque<>();
final long maxSize = Long.MAX_VALUE;
private final AtomicLong totalExec = new AtomicLong();
private final Map<Object, AtomicLong> totals = new HashMap<>();
private final Map<Object, Deque<Long>> execTimes = new HashMap<>();
public void trim() {
//Possible optimization is evicting in chunks, segmenting by arrival time
while (this.totalExec.longValue() > this.maxSize) {
final Object t = this.delegate.peek();
final Deque<Long> execTime = this.execTimes.get(t);
final Long exec = execTime.peek();
if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
//If Job Started Inside of Window, remove and re-loop
remove();
}
else {
//Otherwise exit the loop
break;
}
}
}
public Object remove() {
Object removed;
synchronized (this.delegate) { //4 Threads deadlocking on this line !
removed = this.delegate.pollFirst();
}
if (removed != null) {
itemRemoved(removed);
}
return removed;
}
public void itemRemoved(final Object t) {
//Decrement Total & Queue
final AtomicLong catTotal = this.totals.get(t);
if (catTotal != null) {
if (!this.execTimes.get(t).isEmpty()) {
final Long exec = this.execTimes.get(t).pollFirst();
if (exec != null) {
catTotal.addAndGet(-exec);
this.totalExec.addAndGet(-exec);
}
}
}
}
}
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
(强调他们的)
您正在以不同步的方式读写 to/from Map
。
我认为没有理由假定您的代码是线程安全的。
我建议你在 trim
中有一个无限循环,这是由于缺乏线程安全性造成的。
进入同步块相对较慢,因此线程转储很可能总是会显示至少有几个线程在等待获取锁。
您的第一个线程在等待 pollFirst
时持有锁。
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
其他线程正在等待获取锁。
您将需要提供整个线程转储以确定哪个线程持有 0x0000000052ec4000
上的锁,这是阻止您的 pollFirst
调用返回的原因。
为了死锁,您至少需要两个同时锁定同一线程中的至少两个对象,而您发布的代码似乎没有做到这一点。您指出的错误可能适用,但正如我所读,这是一个表面问题,线程不是 'locked',而是在等待获取相关对象(ArrayDeque)的锁。如果出现死锁,您应该会在日志中看到 "deadlock" 消息。它会调出互相阻塞的两个线程。
我不相信线程转储说存在死锁。它只是告诉您在进行转储时有多少线程在监视器上等待。由于在给定时刻只有一个线程可能拥有监视器,所以这并不奇怪。
您在应用程序中看到什么行为让您认为自己遇到了死锁?您的代码中缺少很多东西,特别是委托 Dequeue 中的对象来自哪里。我的猜测是您没有完全死锁,而是其他一些看起来像死锁的问题。
感谢这里的回复,很明显问题是 none 线程安全使用多个 Collections。
为了解决这个问题,我已经使 trim 方法同步,并将 HashMap 的使用替换为 ConcurrentHashMap,将 ArrayDeque 的使用替换为 LinkedBlockingDeque (并发 Collections FTW!)
进一步计划的改进是将 2 个单独的地图的使用更改为包含自定义 Object 的单个地图,这样可以保持操作(在 itemRemoved 中)原子性。