Java 在本地资源同步期间死锁?

Java Deadlock during a synchronized on a local resource?

我发现多个线程在同一行代码上死锁的问题。 我无法在本地或任何测试中重现该问题,但来自生产环境的线程转储已经非常清楚地显示了该问题。

我不明白为什么线程会在下面的同步行上被阻塞,因为在调用堆栈或任何其他线程中的对象上没有其他同步。有谁知道发生了什么,或者我什至如何重现这个问题(目前正在尝试使用 15 个线程循环访问 trim(),同时通过我的队列处理 2000 个任务 - 但无法重现)

在下面的线程转储中,我认为具有 'locked' 状态的多个线程可能是 Java 错误的表现:http://bugs.java.com/view_bug.do?bug_id=8047816 JStack 报告线程处于错误状态。 (我使用的是 JDK 版本:1.7.0_51)

干杯!

这是线程转储中线程的视图.....

"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - locked <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
    - <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    -  locked <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
    - <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
    - <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)


"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

这里是提取出来的Java代码,显示错误的地方...

public class Deadlock {
        final Deque<Object> delegate  = new ArrayDeque<>();
        final long maxSize = Long.MAX_VALUE;

        private final AtomicLong totalExec = new AtomicLong();
        private final Map<Object, AtomicLong> totals = new HashMap<>();
        private final Map<Object, Deque<Long>> execTimes = new HashMap<>();

        public void trim() {
            //Possible optimization is evicting in chunks, segmenting by arrival time
            while (this.totalExec.longValue() > this.maxSize) {
                final Object t = this.delegate.peek();
                final Deque<Long> execTime = this.execTimes.get(t);
                final Long exec = execTime.peek();
                if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
                    //If Job Started Inside of Window, remove and re-loop
                    remove();
                }
                else {
                    //Otherwise exit the loop
                    break;
                }
            }
        }

        public Object remove() {
            Object removed;
            synchronized (this.delegate) { //4 Threads deadlocking on this line !
                removed = this.delegate.pollFirst();
            }
            if (removed != null) {
                itemRemoved(removed);
            }
            return removed;
        }

        public void itemRemoved(final Object t) {
            //Decrement Total & Queue
            final AtomicLong catTotal = this.totals.get(t);
            if (catTotal != null) {
                if (!this.execTimes.get(t).isEmpty()) {
                    final Long exec = this.execTimes.get(t).pollFirst();
                    if (exec != null) {
                        catTotal.addAndGet(-exec);
                        this.totalExec.addAndGet(-exec);
                    }
                }
            }
        }
    }

来自documentation for HashMap

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

(强调他们的)

您正在以不同步的方式读写 to/from Map

我认为没有理由假定您的代码是线程安全的。

我建议你在 trim 中有一个无限循环,这是由于缺乏线程安全性造成的。

进入同步块相对较慢,因此线程转储很可能总是会显示至少有几个线程在等待获取锁。

您的第一个线程在等待 pollFirst 时持有锁。

"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - locked <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)

其他线程正在等待获取锁。 您将需要提供整个线程转储以确定哪个线程持有 0x0000000052ec4000 上的锁,这是阻止您的 pollFirst 调用返回的原因。

为了死锁,您至少需要两个同时锁定同一线程中的至少两个对象,而您发布的代码似乎没有做到这一点。您指出的错误可能适用,但正如我所读,这是一个表面问题,线程不是 'locked',而是在等待获取相关对象(ArrayDeque)的锁。如果出现死锁,您应该会在日志中看到 "deadlock" 消息。它会调出互相阻塞的两个线程。

我不相信线程转储说存在死锁。它只是告诉您在进行转储时有多少线程在监视器上等待。由于在给定时刻只有一个线程可能拥有监视器,所以这并不奇怪。

您在应用程序中看到什么行为让您认为自己遇到了死锁?您的代码中缺少很多东西,特别是委托 Dequeue 中的对象来自哪里。我的猜测是您没有完全死锁,而是其他一些看起来像死锁的问题。

感谢这里的回复,很明显问题是 none 线程安全使用多个 Collections。

为了解决这个问题,我已经使 trim 方法同步,并将 HashMap 的使用替换为 ConcurrentHashMap,将 ArrayDeque 的使用替换为 LinkedBlockingDeque (并发 Collections FTW!)

进一步计划的改进是将 2 个单独的地图的使用更改为包含自定义 Object 的单个地图,这样可以保持操作(在 itemRemoved 中)原子性。