告警
GC日志分析
日志
2022-11-17T14:28:40.315+0800: 1956232.826: [GC (Allocation Failure) 2022-11-17T14:28:40.315+0800: 1956232.826: [ParNew:
1576103K->2817K(1769472K), 0.0241066 secs] 4197176K->2624616K(5046272K), 0.0243910 secs] [Times: user=0.04 sys=0.00, rea
l=0.03 secs]
2022-11-17T14:28:40.347+0800: 1956232.859: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2621798K(3276800K)] 2648559K(5046
272K), 0.0110744 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
2022-11-17T14:28:40.359+0800: 1956232.870: [CMS-concurrent-mark-start]
2022-11-17T14:28:45.647+0800: 1956238.159: [CMS-concurrent-mark: 5.282/5.289 secs] [Times: user=1.63 sys=0.41, real=5.28
secs]
2022-11-17T14:28:45.648+0800: 1956238.159: [CMS-concurrent-preclean-start]
2022-11-17T14:28:45.671+0800: 1956238.182: [CMS-concurrent-preclean: 0.023/0.023 secs] [Times: user=0.03 sys=0.00, real=
0.03 secs]
2022-11-17T14:28:45.671+0800: 1956238.182: [CMS-concurrent-abortable-preclean-start]
2022-11-17T14:28:50.526+0800: 1956243.037: [GC (Allocation Failure) 2022-11-17T14:28:50.526+0800: 1956243.037: [ParNew2022-11-17T14:28:50.548+0800: 1956243.060: [CMS-concurrent-abortable-preclean: 1.472/4.877 secs] [Times: user=2.33 sys=0.03, real=4.87 secs]
: 1575681K->33553K(1769472K), 0.0281517 secs] 4197480K->2655355K(5046272K), 0.0284249 secs] [Times: user=0.05 sys=0.01, real=0.03 secs]
2022-11-17T14:28:50.572+0800: 1956243.083: [GC (CMS Final Remark) [YG occupancy: 78279 K (1769472 K)]2022-11-17T14:28:50.572+0800: 1956243.083: [Rescan (parallel) , 0.0115711 secs]2022-11-17T14:28:50.584+0800: 1956243.095: [weak refs processing, 0.8799103 secs]2022-11-17T14:28:51.464+0800: 1956243.975: [class unloading, 3.4124151 secs]2022-11-17T14:28:54.876+0800: 1956247.387: [scrub symbol table, 0.6587619 secs]2022-11-17T14:28:55.535+0800: 1956248.046: [scrub string table, 0.0028382 secs][1 CMS-remark: 2621801K(3276800K)] 2700081K(5046272K), 4.9762131 secs] [Times: user=0.29 sys=0.13, real=4.97 secs]
2022-11-17T14:28:55.550+0800: 1956248.061: [CMS-concurrent-sweep-start]
2022-11-17T14:28:57.959+0800: 1956250.470: [CMS-concurrent-sweep: 2.396/2.409 secs] [Times: user=2.31 sys=0.27, real=2.4
1 secs]
2022-11-17T14:28:57.959+0800: 1956250.470: [CMS-concurrent-reset-start]
2022-11-17T14:28:57.970+0800: 1956250.481: [CMS-concurrent-reset: 0.011/0.011 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
CMS过程耗时分析
- 初始标记Initial-Mark:此阶段的目标就是标记 GC Root 能直接关联到的对象,或者年轻代对老年代对象的引用,速度很快,但是会停止用户线程,耗时0.02s
2022-11-17T14:28:40.347+0800: 1956232.859: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2621798K(3276800K)] 2648559K(5046
272K), 0.0110744 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
- 并发标记Concurrent-mark:从上个阶段找到的 GC Roots 开始遍历整个对象图的过程,一般这个阶段耗时比较长,但是不需要停顿用户线程,那也就是说在标记的过程中对象图还在动态变化,因为用户线程还在执行。耗时5.28s
2022-11-17T14:28:40.359+0800: 1956232.870: [CMS-concurrent-mark-start]
2022-11-17T14:28:45.647+0800: 1956238.159: [CMS-concurrent-mark: 5.282/5.289 secs] [Times: user=1.63 sys=0.41, real=5.28
secs]
- 并发预清理Concurrent pre clean:这也是与用户线程并发的一个阶段,前一个标记阶段中与用户线程并发执行,一些对象的引用发生了更改,每当这种情况发生,JVM 都会将变化对象的对区域(称为“卡”)标记为“脏卡”,这个阶段会将“脏卡”标记,并将它们可达的对象也标记,完成后清洁“脏卡”。耗时0.03s
2022-11-17T14:28:45.648+0800: 1956238.159: [CMS-concurrent-preclean-start]
2022-11-17T14:28:45.671+0800: 1956238.182: [CMS-concurrent-preclean: 0.023/0.023 secs] [Times: user=0.03 sys=0.00, real=
0.03 secs]
- 并发可中断的预清理Concurrent abortable pre clean:这个阶段尝试着去承担下一个阶段Final Remark阶段足够多的工作。这个阶段持续的时间依赖很多的因素,由于这个阶段是重复的做相同的事情直到发生aboart的条件(比如:重复的次数、多少量的工作、持续的时间等等)之一才会停止。此阶段最大持续时间为5秒,之所以可以持续5秒,另外一个原因也是为了期待这5秒内能够发生一次ygc,清理年轻带的引用,是的下个阶段的重新标记阶段,扫描年轻带指向老年代的引用的时间减少。耗时0.03s
2022-11-17T14:28:45.671+0800: 1956238.182: [CMS-concurrent-abortable-preclean-start]
2022-11-17T14:28:50.526+0800: 1956243.037: [GC (Allocation Failure) 2022-11-17T14:28:50.526+0800: 1956243.037: [ParNew2022-11-17T14:28:50.548+0800: 1956243.060: [CMS-concurrent-abortable-preclean: 1.472/4.877 secs] [Times: user=2.33 sys=0.03, real=4.87 secs]
: 1575681K->33553K(1769472K), 0.0281517 secs] 4197480K->2655355K(5046272K), 0.0284249 secs] [Times: user=0.05 sys=0.01, real=0.03 secs]
- 重新标记Final remark:该阶段为了修正并发标记期间,因用户程序持续运行导致的对象变化。会停止用户线程。耗时4.97s
2022-11-17T14:28:50.572+0800: 1956243.083: [GC (CMS Final Remark) [YG occupancy: 78279 K (1769472 K)]2022-11-17T14:28:50.572+0800: 1956243.083: [Rescan (parallel) , 0.0115711 secs]2022-11-17T14:28:50.584+0800: 1956243.095: [weak refs processing, 0.8799103 secs]2022-11-17T14:28:51.464+0800: 1956243.975: [class unloading, 3.4124151 secs]2022-11-17T14:28:54.876+0800: 1956247.387: [scrub symbol table, 0.6587619 secs]2022-11-17T14:28:55.535+0800: 1956248.046: [scrub string table, 0.0028382 secs][1 CMS-remark: 2621801K(3276800K)] 2700081K(5046272K), 4.9762131 secs] [Times: user=0.29 sys=0.13, real=4.97 secs]
- 并发清除Concurrent sweep:这个阶段的目的就是移除那些不用的对象,回收他们占用的空间并且为将来使用。耗时2.41s
2022-11-17T14:28:55.550+0800: 1956248.061: [CMS-concurrent-sweep-start]
2022-11-17T14:28:57.959+0800: 1956250.470: [CMS-concurrent-sweep: 2.396/2.409 secs] [Times: user=2.31 sys=0.27, real=2.4
1 secs]
- 并发重置Concurrent reset:这个阶段并发执行,重新设置CMS算法内部的数据结构,准备下一个CMS生命周期的使用。耗时0.01s
2022-11-17T14:28:57.959+0800: 1956250.470: [CMS-concurrent-reset-start]
2022-11-17T14:28:57.970+0800: 1956250.481: [CMS-concurrent-reset: 0.011/0.011 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
从日志中发现class-unloading也太久了??
GC过程中,长暂停主要是在【重新标记】阶段,日志为
2022-11-17T14:28:50.572+0800: 1956243.083: [GC (CMS Final Remark) [YG occupancy: 78279 K (1769472 K)]2022-11-17T14:28:50.572+0800: 1956243.083: [Rescan (parallel) , 0.0115711 secs]2022-11-17T14:28:50.584+0800: 1956243.095: [weak refs processing, 0.8799103 secs]2022-11-17T14:28:51.464+0800: 1956243.975: [class unloading, 3.4124151 secs]2022-11-17T14:28:54.876+0800: 1956247.387: [scrub symbol table, 0.6587619 secs]2022-11-17T14:28:55.535+0800: 1956248.046: [scrub string table, 0.0028382 secs][1 CMS-remark: 2621801K(3276800K)] 2700081K(5046272K), 4.9762131 secs] [Times: user=0.29 sys=0.13, real=4.97 secs]
发现,class unloading耗时较久为3.412s![image.png](https://img-blog.csdnimg.cn/img_convert/f45bcf0da1699c238c7b792a8a919d65.png#averageHue=#e5e5e5&clientId=u6880d548-dd21-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=25&id=u7631848b&margin=[object Object]&name=image.png&originHeight=50&originWidth=582&originalType=binary&ratio=1&rotation=0&showTitle=false&size=8927&status=done&style=stroke&taskId=ua0f06c69-0e35-44c8-a545-21fd4847d9a&title=&width=291)
查资料,发现类卸载阶段产生了很多的页活动,导致把cpu时间从GC线程抢走,内存页交换没有单独的监控,只能从实例io监控看了
![image.png](https://img-blog.csdnimg.cn/img_convert/7c70810ce5c41a38278e1c9699569384.png#averageHue=#fafafa&clientId=u6880d548-dd21-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=133&id=ub95b4666&margin=[object Object]&name=image.png&originHeight=265&originWidth=538&originalType=binary&ratio=1&rotation=0&showTitle=false&size=33100&status=done&style=stroke&taskId=u776a3268-83d3-449e-84f8-80201f332d9&title=&width=269)
从上面可以看到,发生GC时,实例io确实有突增
内存页交换机制在生产环境一般是禁用的,因此优化手段是关闭内存交换机制
参考资料:
解决方案
- 关闭搜索服务的swap机制,2022-11-17日晚执行