即使 Old/Young gen 有足够的 space,CMS tenured 阶段的频率很高
High frequency of CMS tenured phases even when Old/Young gen have ample space
我事先承认这个问题与以下问题非常相似:
high-number-of-cms-mark/remark-pauses-even-though-old-gen-is-not-half-full and tenured-collection-starting-for-no-apparent-reason。我发帖是因为 1. 这些线程已经超过 1 年了,并且 2. 我希望学习如何找到这种行为发生的根本原因。
我们有一个 OAS/OC4J (这不是我们的错!) 24/7 Java 应用程序服务器 运行 在 RHEL5/Redhat 5.11,Java 6. 多年来,这在内存方面一直很稳定,直到最近,由于频繁的 CMS tenured space 周期,我们开始看到高 CPU 利用率。即使年轻和终身 space 中的 space 人数过多,也会发生这种情况。我对这个主题的阅读表明,CMS 终身制周期通常在终身制(老一代)space 大约是容量的 92% 时开始。但我们看到这种情况在 30% 的容量甚至更少的情况下反复发生。而且,我应该提到,当总堆似乎小于整体堆使用的默认 45% 值时,又名 InitiatingHeapOccupancyPercent
.
我们仍在审查最近的代码更改并尝试了一些方法,但这些问题仍然存在。因此,尽管 dev/qa 环境中的工作正在进行中,但我们无法在生产服务器之外进行复制。
我想我在这里有三个主要问题:
- 什么可能触发了 CMS 周期的频繁(过早?)初始标记阶段。而且,我们如何验证或调查这一点?例如,检查当前内存分配的各个部分(eden、survivor、old-gen)是否有巨大的对象等?
- 我读过有关使用
-XX:+UseCMSInitiatingOccupancyOnly
和 -XX:CMSInitiatingOccupancyFraction=NN
的信息(例如在上面引用的文章中)。什么可能是合理的(== 安全的)值
对于 NN,以这种方式覆盖默认 CMS 人体工程学的风险是什么?
- 还有其他我们应该考虑或调查的事情吗?
以下是有关我们问题的一些详细信息:
- 所以,到目前为止,我们无法在生产之外重现这一点。因此,调试或调整不是一种选择
- 我们使用每晚的 cron 作业强制 Full GC 以通过 jmap -histo:live pid[=88= 减少碎片]
- 我们的 JVM 命令行参数 wrt 内存如下:
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:-TraceClassUnloading
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExplicitGCInvokesConcurrent
-XX:+UseCMSCompactAtFullCollection
-Xms10g
-Xmx10g
-Xmn3g
-XX:SurvivorRatio=6
-XX:PermSize=256m
-XX:MaxPermSize=256m
-XX:TargetSurvivorRatio=80
-XX:ParallelGCThreads=8
注意:我们最近尝试将年轻一代提升到 3.5g,这是一个有些绝望的实验。 (在生产中!)没有观察到真正可辨别的差异
jmap -heap
的输出。 注: From Space
似乎总是100%占用。这是正常的,还是说明了什么?:
在新一代中使用并行线程。
使用线程本地对象分配。
并发标记清除 GC</p>
<pre><code>Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 10737418240 (10240.0MB)
NewSize = 3758096384 (3584.0MB)
MaxNewSize = 3758096384 (3584.0MB)
OldSize = 5439488 (5.1875MB)
NewRatio = 2
SurvivorRatio = 6
PermSize = 268435456 (256.0MB)
MaxPermSize = 268435456 (256.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 3288334336 (3136.0MB)
used = 1048575408 (999.9994354248047MB)
free = 2239758928 (2136.0005645751953MB)
31.88773709900525% used
Eden Space:
capacity = 2818572288 (2688.0MB)
used = 578813360 (551.9994354248047MB)
free = 2239758928 (2136.0005645751953MB)
20.535693282172794% used
From Space:
capacity = 469762048 (448.0MB)
used = 469762048 (448.0MB)
free = 0 (0.0MB)
100.0% used
To Space:
capacity = 469762048 (448.0MB)
used = 0 (0.0MB)
free = 469762048 (448.0MB)
0.0% used
concurrent mark-sweep generation:
capacity = 6979321856 (6656.0MB)
used = 1592989856 (1519.1935119628906MB)
free = 5386332000 (5136.806488037109MB)
22.82442175425016% used
Perm Generation:
capacity = 268435456 (256.0MB)
used = 249858712 (238.2838363647461MB)
free = 18576744 (17.716163635253906MB)
93.07962357997894% used
- 内部 GC 日志解析器的输出,显示频繁的初始标记 (IM)/重新标记 (RM) 周期和低 young/tenured 占用率。你可以看到 Young gen 的占用率慢慢增长到 98.30%,很快我们就达到了预期
ParNew
(PN) Young GC:
| | PN: |YHeapB4|YHeapAf|YHeapDt|
| |------|--------|--------|--------|
| |PF,CF,| | | |
| | SY: |OHeapB4|OHeapAf|OHeapDt|
| |------|--------|--------|--------|
日期 |时间 |IM,RM:|Y Occ% |OHeap |O Occ% |Duration|THeapB4|THeapAf|THeapDt|Promoted|% Garbage|Interval|分配 |AllocRate(MB/s)|PromoRate(MB/s)
---------------------------------------------- ---------------------------------------------- ---------------------------------------------- ------
2016-12-05|14:16:59.455| RM | 15.11|1620287| 23.77| 0.18
2016-12-05|14:17:03.057|即时通讯 | 16.16|1615358| 23.70| 0.66
2016-12-05|14:17:13.444| RM | 17.70|1615358| 23.70| 0.23
2016-12-05|14:17:17.227|即时通讯 | 18.82|1513691| 22.21| 0.70
2016-12-05|14:17:27.887| RM | 28.54|1513691| 22.21| 0.33
2016-12-05|14:17:30.390|即时通讯 | 29.45|1513667| 22.21| 1.02
2016-12-05|14:17:41.326| RM | 32.90|1513667| 22.21| 0.66
2016-12-05|14:17:44.290|即时通讯 | 34.86|1513666| 22.21| 1.23
...[为简洁起见删除] ...
2016-12-05|14:37:28.024|即时通讯 | 95.88|1377444| 20.21| 2.93
2016-12-05|14:37:40.601| RM | 95.89|1377444| 20.21| 2.15
2016-12-05|14:37:46.032|即时通讯 | 95.89|1377443| 20.21| 2.83
2016-12-05|14:37:58.557| RM | 98.30|1377443| 20.21| 2.21
2016-12-05|14:38:03.988|即时通讯 | 98.30|1377307| 20.21| 2.90
2016-12-05|14:38:15.638|编号 |3211264| 458752|2752512| 0.77|4588571|1942900|2645671| 106841| 96.12
2016-12-05|14:38:18.193| RM | 18.04|1484148| 21.78| 0.24
2016-12-05|14:38:21.813|即时通讯 | 18.04|1480802| 21.73| 0.75
2016-12-05|14:38:31.822| RM | 19.05|1480802| 21.73| 0.34
...[等等]...</p>
<p></pre>
- 实际 GC 日志输出从上面输出的
14:17:03.057
处的第一个初始标记 (IM) 开始。与上面类似地截断,但我确实展示了 ParNew Young GC 的完整性:
2016-12-05T14:17:03.057-0800: [GC [1 CMS-initial-mark: 1615358K(6815744K)] 2134211K(10027008K), 0.6538170 secs] [Times: user=0.65 sys=0.00, real=0.66 secs]
2016-12-05T14:17:06.178-0800: [CMS-concurrent-mark: 2.463/2.467 secs] [Times: user=5.04 sys=0.01, real=2.46 secs]
2016-12-05T14:17:06.251-0800: [CMS-concurrent-preclean: 0.072/0.073 secs] [Times: user=0.07 sys=0.00, real=0.08 secs]
CMS: abort preclean due to time 2016-12-05T14:17:13.442-0800: [CMS-concurrent-abortable-preclean: 7.189/7.192 secs] [Times: user=7.46 sys=0.02, real=7.19 secs]
2016-12-05T14:17:13.444-0800: [GC[YG occupancy: 568459 K (3211264 K)][Rescan (parallel) , 0.1020240 secs][weak refs processing, 0.0312140 secs][class unloading, 0.0396040 secs][scrub symbol & string tables, 0.0368990 secs] [1 CMS-remark: 1615358K(6815744K)] 2183818K(10027008K), 0.2344980 secs] [Times: user=0.89 sys=0.00, real=0.23 secs]
2016-12-05T14:17:15.212-0800: [CMS-concurrent-sweep: 1.533/1.533 secs] [Times: user=1.54 sys=0.00, real=1.54 secs]
2016-12-05T14:17:15.225-0800: [CMS-concurrent-reset: 0.013/0.013 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2016-12-05T14:17:17.227-0800: [GC [1 CMS-initial-mark: 1513691K(6815744K)] 2118034K(10027008K), 0.7036950 secs] [Times: user=0.71 sys=0.00, real=0.70 secs]
2016-12-05T14:17:20.548-0800: [CMS-concurrent-mark: 2.613/2.617 secs] [Times: user=5.62 sys=0.03, real=2.62 secs]
2016-12-05T14:17:20.667-0800: [CMS-concurrent-preclean: 0.113/0.119 secs] [Times: user=0.23 sys=0.00, real=0.12 secs]
CMS: abort preclean due to time 2016-12-05T14:17:27.886-0800: [CMS-concurrent-abortable-preclean: 7.217/7.219 secs] [Times: user=8.54 sys=0.07, real=7.22 secs]
2016-12-05T14:17:27.887-0800: [GC[YG occupancy: 916526 K (3211264 K)][Rescan (parallel) , 0.2159770 secs][weak refs processing, 0.0000180 secs][class unloading, 0.0460640 secs][scrub symbol & string tables, 0.0404060 secs] [1 CMS-remark: 1513691K(6815744K)] 2430218K(10027008K), 0.3276590 secs] [Times: user=1.59 sys=0.02, real=0.33 secs]
2016-12-05T14:17:29.611-0800: [CMS-concurrent-sweep: 1.396/1.396 secs] [Times: user=1.40 sys=0.00, real=1.39 secs]
...[And So On]...
2016-12-05T14:38:03.988-0800: [GC [1 CMS-initial-mark: 1377307K(6815744K)] 4534072K(10027008K), 2.9013180 secs] [Times: user=2.90 sys=0.00, real=2.90 secs]
2016-12-05T14:38:09.403-0800: [CMS-concurrent-mark: 2.507/2.514 secs] [Times: user=5.03 sys=0.03, real=2.51 secs]
2016-12-05T14:38:09.462-0800: [CMS-concurrent-preclean: 0.058/0.058 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
2016-12-05T14:38:15.638-0800: [GC [ParNew
Desired survivor size 375809632 bytes, new threshold 4 (max 15)
- age 1: 115976192 bytes, 115976192 total
- age 2: 104282224 bytes, 220258416 total
- age 3: 85871464 bytes, 306129880 total
- age 4: 98122648 bytes, 404252528 total
: 3211264K->458752K(3211264K), 0.7731320 secs] 4588571K->1942900K(10027008K), 0.7732860 secs] [Times: user=3.15 sys=0.00, real=0.77 secs]
CMS: abort preclean due to time 2016-12-05T14:38:18.192-0800: [CMS-concurrent-abortable-preclean: 7.842/8.730 secs] [Times: user=12.50 sys=0.07, real=8.73 secs]
2016-12-05T14:38:18.193-0800: [GC[YG occupancy: 579220 K (3211264 K)][Rescan (parallel) , 0.1208810 secs][weak refs processing, 0.0008320 secs][class unloading, 0.0483220 secs][scrub symbol & string tables, 0.0414970 secs] [1 CMS-remark: 1484148K(6815744K)] 2063368K(10027008K), 0.2376050 secs] [Times: user=1.07 sys=0.00, real=0.24 secs]
2016-12-05T14:38:19.798-0800: [CMS-concurrent-sweep: 1.366/1.366 secs] [Times: user=1.40 sys=0.00, real=1.37 secs]
2016-12-05T14:38:19.811-0800: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2016-12-05T14:38:21.813-0800: [GC [1 CMS-initial-mark: 1480802K(6815744K)] 2060239K(10027008K), 0.7487000 secs] [Times: user=0.75 sys=0.00, real=0.75 secs]
根据 Alexey 的出色观察和建议,我们将尝试在生产中提高 Perm 生成(我会报告)。但作为对他猜测的初步验证,我对我们其中一台主机上所有容器 JVM 的 perm gen 使用情况进行了调查,这似乎非常合理。在下面的代码片段中,PID=2979(perm gen capacity 为 92%)表现出恒定的 CMS 收集行为。
[oracle@ikm-oasb-3 bin]$ for p in `opmnctl status | grep OC4JG | awk '{print }'`; do echo -n "PID=$p "; jmap -heap $p | grep -A4 'Perm Gen' | egrep '%'; done 2> /dev/null
PID=8456 89.31778371334076% used
PID=8455 89.03931379318237% used
PID=8454 91.1630779504776% used
PID=8453 89.17466700077057% used
PID=8452 87.69496977329254% used
PID=2979 92.2750473022461% used
PID=1884 90.25585949420929% used
PID=785 76.16643011569977% used
PID=607 89.06879723072052% used
CMS 旧 space 清理周期它在旧 space 达到占用阈值或永久 space 达到阈值时触发。
在 Java 之前 8 permanent space 是垃圾收集堆的一部分,在 CMS 算法的范围内。
在你的情况下,烫发率为 93%
Perm Generation:
capacity = 268435456 (256.0MB)
used = 249858712 (238.2838363647461MB)
free = 18576744 (17.716163635253906MB)
93.07962357997894% used
This article描述了类似的案例。
如果建议您增加 perm space 或者您可以使用 –XX: CMSInitiatingPermOccupancyFraction=95
选项为 perm space 配置单独的占用阈值。
我事先承认这个问题与以下问题非常相似: high-number-of-cms-mark/remark-pauses-even-though-old-gen-is-not-half-full and tenured-collection-starting-for-no-apparent-reason。我发帖是因为 1. 这些线程已经超过 1 年了,并且 2. 我希望学习如何找到这种行为发生的根本原因。
我们有一个 OAS/OC4J (这不是我们的错!) 24/7 Java 应用程序服务器 运行 在 RHEL5/Redhat 5.11,Java 6. 多年来,这在内存方面一直很稳定,直到最近,由于频繁的 CMS tenured space 周期,我们开始看到高 CPU 利用率。即使年轻和终身 space 中的 space 人数过多,也会发生这种情况。我对这个主题的阅读表明,CMS 终身制周期通常在终身制(老一代)space 大约是容量的 92% 时开始。但我们看到这种情况在 30% 的容量甚至更少的情况下反复发生。而且,我应该提到,当总堆似乎小于整体堆使用的默认 45% 值时,又名 InitiatingHeapOccupancyPercent
.
我们仍在审查最近的代码更改并尝试了一些方法,但这些问题仍然存在。因此,尽管 dev/qa 环境中的工作正在进行中,但我们无法在生产服务器之外进行复制。
我想我在这里有三个主要问题:
- 什么可能触发了 CMS 周期的频繁(过早?)初始标记阶段。而且,我们如何验证或调查这一点?例如,检查当前内存分配的各个部分(eden、survivor、old-gen)是否有巨大的对象等?
- 我读过有关使用
-XX:+UseCMSInitiatingOccupancyOnly
和-XX:CMSInitiatingOccupancyFraction=NN
的信息(例如在上面引用的文章中)。什么可能是合理的(== 安全的)值 对于 NN,以这种方式覆盖默认 CMS 人体工程学的风险是什么? - 还有其他我们应该考虑或调查的事情吗?
以下是有关我们问题的一些详细信息:
- 所以,到目前为止,我们无法在生产之外重现这一点。因此,调试或调整不是一种选择
- 我们使用每晚的 cron 作业强制 Full GC 以通过 jmap -histo:live pid[=88= 减少碎片]
- 我们的 JVM 命令行参数 wrt 内存如下:
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:-TraceClassUnloading
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExplicitGCInvokesConcurrent
-XX:+UseCMSCompactAtFullCollection
-Xms10g
-Xmx10g
-Xmn3g
-XX:SurvivorRatio=6
-XX:PermSize=256m
-XX:MaxPermSize=256m
-XX:TargetSurvivorRatio=80
-XX:ParallelGCThreads=8
注意:我们最近尝试将年轻一代提升到 3.5g,这是一个有些绝望的实验。 (在生产中!)没有观察到真正可辨别的差异
jmap -heap
的输出。 注:From Space
似乎总是100%占用。这是正常的,还是说明了什么?:
在新一代中使用并行线程。
使用线程本地对象分配。
并发标记清除 GC</p>
<pre><code>Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 10737418240 (10240.0MB)
NewSize = 3758096384 (3584.0MB)
MaxNewSize = 3758096384 (3584.0MB)
OldSize = 5439488 (5.1875MB)
NewRatio = 2
SurvivorRatio = 6
PermSize = 268435456 (256.0MB)
MaxPermSize = 268435456 (256.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 3288334336 (3136.0MB)
used = 1048575408 (999.9994354248047MB)
free = 2239758928 (2136.0005645751953MB)
31.88773709900525% used
Eden Space:
capacity = 2818572288 (2688.0MB)
used = 578813360 (551.9994354248047MB)
free = 2239758928 (2136.0005645751953MB)
20.535693282172794% used
From Space:
capacity = 469762048 (448.0MB)
used = 469762048 (448.0MB)
free = 0 (0.0MB)
100.0% used
To Space:
capacity = 469762048 (448.0MB)
used = 0 (0.0MB)
free = 469762048 (448.0MB)
0.0% used
concurrent mark-sweep generation:
capacity = 6979321856 (6656.0MB)
used = 1592989856 (1519.1935119628906MB)
free = 5386332000 (5136.806488037109MB)
22.82442175425016% used
Perm Generation:
capacity = 268435456 (256.0MB)
used = 249858712 (238.2838363647461MB)
free = 18576744 (17.716163635253906MB)
93.07962357997894% used
- 内部 GC 日志解析器的输出,显示频繁的初始标记 (IM)/重新标记 (RM) 周期和低 young/tenured 占用率。你可以看到 Young gen 的占用率慢慢增长到 98.30%,很快我们就达到了预期
ParNew
(PN) Young GC:
| | PN: |YHeapB4|YHeapAf|YHeapDt|
| |------|--------|--------|--------|
| |PF,CF,| | | |
| | SY: |OHeapB4|OHeapAf|OHeapDt|
| |------|--------|--------|--------|
日期 |时间 |IM,RM:|Y Occ% |OHeap |O Occ% |Duration|THeapB4|THeapAf|THeapDt|Promoted|% Garbage|Interval|分配 |AllocRate(MB/s)|PromoRate(MB/s)
---------------------------------------------- ---------------------------------------------- ---------------------------------------------- ------
2016-12-05|14:16:59.455| RM | 15.11|1620287| 23.77| 0.18
2016-12-05|14:17:03.057|即时通讯 | 16.16|1615358| 23.70| 0.66
2016-12-05|14:17:13.444| RM | 17.70|1615358| 23.70| 0.23
2016-12-05|14:17:17.227|即时通讯 | 18.82|1513691| 22.21| 0.70
2016-12-05|14:17:27.887| RM | 28.54|1513691| 22.21| 0.33
2016-12-05|14:17:30.390|即时通讯 | 29.45|1513667| 22.21| 1.02
2016-12-05|14:17:41.326| RM | 32.90|1513667| 22.21| 0.66
2016-12-05|14:17:44.290|即时通讯 | 34.86|1513666| 22.21| 1.23
...[为简洁起见删除] ...
2016-12-05|14:37:28.024|即时通讯 | 95.88|1377444| 20.21| 2.93
2016-12-05|14:37:40.601| RM | 95.89|1377444| 20.21| 2.15
2016-12-05|14:37:46.032|即时通讯 | 95.89|1377443| 20.21| 2.83
2016-12-05|14:37:58.557| RM | 98.30|1377443| 20.21| 2.21
2016-12-05|14:38:03.988|即时通讯 | 98.30|1377307| 20.21| 2.90
2016-12-05|14:38:15.638|编号 |3211264| 458752|2752512| 0.77|4588571|1942900|2645671| 106841| 96.12
2016-12-05|14:38:18.193| RM | 18.04|1484148| 21.78| 0.24
2016-12-05|14:38:21.813|即时通讯 | 18.04|1480802| 21.73| 0.75
2016-12-05|14:38:31.822| RM | 19.05|1480802| 21.73| 0.34
...[等等]...</p>
<p></pre>
- 实际 GC 日志输出从上面输出的
14:17:03.057
处的第一个初始标记 (IM) 开始。与上面类似地截断,但我确实展示了 ParNew Young GC 的完整性:
2016-12-05T14:17:03.057-0800: [GC [1 CMS-initial-mark: 1615358K(6815744K)] 2134211K(10027008K), 0.6538170 secs] [Times: user=0.65 sys=0.00, real=0.66 secs]
2016-12-05T14:17:06.178-0800: [CMS-concurrent-mark: 2.463/2.467 secs] [Times: user=5.04 sys=0.01, real=2.46 secs]
2016-12-05T14:17:06.251-0800: [CMS-concurrent-preclean: 0.072/0.073 secs] [Times: user=0.07 sys=0.00, real=0.08 secs]
CMS: abort preclean due to time 2016-12-05T14:17:13.442-0800: [CMS-concurrent-abortable-preclean: 7.189/7.192 secs] [Times: user=7.46 sys=0.02, real=7.19 secs]
2016-12-05T14:17:13.444-0800: [GC[YG occupancy: 568459 K (3211264 K)][Rescan (parallel) , 0.1020240 secs][weak refs processing, 0.0312140 secs][class unloading, 0.0396040 secs][scrub symbol & string tables, 0.0368990 secs] [1 CMS-remark: 1615358K(6815744K)] 2183818K(10027008K), 0.2344980 secs] [Times: user=0.89 sys=0.00, real=0.23 secs]
2016-12-05T14:17:15.212-0800: [CMS-concurrent-sweep: 1.533/1.533 secs] [Times: user=1.54 sys=0.00, real=1.54 secs]
2016-12-05T14:17:15.225-0800: [CMS-concurrent-reset: 0.013/0.013 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2016-12-05T14:17:17.227-0800: [GC [1 CMS-initial-mark: 1513691K(6815744K)] 2118034K(10027008K), 0.7036950 secs] [Times: user=0.71 sys=0.00, real=0.70 secs]
2016-12-05T14:17:20.548-0800: [CMS-concurrent-mark: 2.613/2.617 secs] [Times: user=5.62 sys=0.03, real=2.62 secs]
2016-12-05T14:17:20.667-0800: [CMS-concurrent-preclean: 0.113/0.119 secs] [Times: user=0.23 sys=0.00, real=0.12 secs]
CMS: abort preclean due to time 2016-12-05T14:17:27.886-0800: [CMS-concurrent-abortable-preclean: 7.217/7.219 secs] [Times: user=8.54 sys=0.07, real=7.22 secs]
2016-12-05T14:17:27.887-0800: [GC[YG occupancy: 916526 K (3211264 K)][Rescan (parallel) , 0.2159770 secs][weak refs processing, 0.0000180 secs][class unloading, 0.0460640 secs][scrub symbol & string tables, 0.0404060 secs] [1 CMS-remark: 1513691K(6815744K)] 2430218K(10027008K), 0.3276590 secs] [Times: user=1.59 sys=0.02, real=0.33 secs]
2016-12-05T14:17:29.611-0800: [CMS-concurrent-sweep: 1.396/1.396 secs] [Times: user=1.40 sys=0.00, real=1.39 secs]
...[And So On]...
2016-12-05T14:38:03.988-0800: [GC [1 CMS-initial-mark: 1377307K(6815744K)] 4534072K(10027008K), 2.9013180 secs] [Times: user=2.90 sys=0.00, real=2.90 secs]
2016-12-05T14:38:09.403-0800: [CMS-concurrent-mark: 2.507/2.514 secs] [Times: user=5.03 sys=0.03, real=2.51 secs]
2016-12-05T14:38:09.462-0800: [CMS-concurrent-preclean: 0.058/0.058 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
2016-12-05T14:38:15.638-0800: [GC [ParNew
Desired survivor size 375809632 bytes, new threshold 4 (max 15)
- age 1: 115976192 bytes, 115976192 total
- age 2: 104282224 bytes, 220258416 total
- age 3: 85871464 bytes, 306129880 total
- age 4: 98122648 bytes, 404252528 total
: 3211264K->458752K(3211264K), 0.7731320 secs] 4588571K->1942900K(10027008K), 0.7732860 secs] [Times: user=3.15 sys=0.00, real=0.77 secs]
CMS: abort preclean due to time 2016-12-05T14:38:18.192-0800: [CMS-concurrent-abortable-preclean: 7.842/8.730 secs] [Times: user=12.50 sys=0.07, real=8.73 secs]
2016-12-05T14:38:18.193-0800: [GC[YG occupancy: 579220 K (3211264 K)][Rescan (parallel) , 0.1208810 secs][weak refs processing, 0.0008320 secs][class unloading, 0.0483220 secs][scrub symbol & string tables, 0.0414970 secs] [1 CMS-remark: 1484148K(6815744K)] 2063368K(10027008K), 0.2376050 secs] [Times: user=1.07 sys=0.00, real=0.24 secs]
2016-12-05T14:38:19.798-0800: [CMS-concurrent-sweep: 1.366/1.366 secs] [Times: user=1.40 sys=0.00, real=1.37 secs]
2016-12-05T14:38:19.811-0800: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2016-12-05T14:38:21.813-0800: [GC [1 CMS-initial-mark: 1480802K(6815744K)] 2060239K(10027008K), 0.7487000 secs] [Times: user=0.75 sys=0.00, real=0.75 secs]
根据 Alexey 的出色观察和建议,我们将尝试在生产中提高 Perm 生成(我会报告)。但作为对他猜测的初步验证,我对我们其中一台主机上所有容器 JVM 的 perm gen 使用情况进行了调查,这似乎非常合理。在下面的代码片段中,PID=2979(perm gen capacity 为 92%)表现出恒定的 CMS 收集行为。
[oracle@ikm-oasb-3 bin]$ for p in `opmnctl status | grep OC4JG | awk '{print }'`; do echo -n "PID=$p "; jmap -heap $p | grep -A4 'Perm Gen' | egrep '%'; done 2> /dev/null
PID=8456 89.31778371334076% used
PID=8455 89.03931379318237% used
PID=8454 91.1630779504776% used
PID=8453 89.17466700077057% used
PID=8452 87.69496977329254% used
PID=2979 92.2750473022461% used
PID=1884 90.25585949420929% used
PID=785 76.16643011569977% used
PID=607 89.06879723072052% used
CMS 旧 space 清理周期它在旧 space 达到占用阈值或永久 space 达到阈值时触发。
在 Java 之前 8 permanent space 是垃圾收集堆的一部分,在 CMS 算法的范围内。
在你的情况下,烫发率为 93%
Perm Generation:
capacity = 268435456 (256.0MB)
used = 249858712 (238.2838363647461MB)
free = 18576744 (17.716163635253906MB)
93.07962357997894% used
This article描述了类似的案例。
如果建议您增加 perm space 或者您可以使用 –XX: CMSInitiatingPermOccupancyFraction=95
选项为 perm space 配置单独的占用阈值。