告警
本次GC日志分析
2022-11-17T17:58:50.518+0800: 1217960.132: [GC (Allocation Failure) 2022-11-17T17:58:50.518+0800: 1217960.132: [ParNew: 1382400K->153600K(1382400K), 0.
5626158 secs] 3419277K->2410488K(4040704K), 0.5628652 secs] [Times: user=1.07 sys=0.00, real=0.56 secs]
2022-11-17T17:58:51.084+0800: 1217960.698: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2256888K(2658304K)] 2419662K(4040704K), 0.0349632 secs] [Times:
user=0.05 sys=0.00, real=0.03 secs]
2022-11-17T17:58:51.119+0800: 1217960.734: [CMS-concurrent-mark-start]
2022-11-17T17:58:51.626+0800: 1217961.240: [CMS-concurrent-mark: 0.506/0.506 secs] [Times: user=1.03 sys=0.00, real=0.51 secs]
2022-11-17T17:58:51.626+0800: 1217961.240: [CMS-concurrent-preclean-start]
2022-11-17T17:58:51.634+0800: 1217961.248: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2022-11-17T17:58:51.634+0800: 1217961.248: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2022-11-17T17:58:56.994+0800: 1217966.608: [CMS-concurrent-abortable-preclean: 4.854/5.360 secs] [Times: user=5.29 sys
=0.03, real=5.36 secs]
2022-11-17T17:58:56.997+0800: 1217966.611: [GC (CMS Final Remark) [YG occupancy: 629022 K (1382400 K)]2022-11-17T17:58:56.997+0800: 1217966.611: [Resca
n (parallel) , 0.0937759 secs]2022-11-17T17:58:57.091+0800: 1217966.705: [weak refs processing, 0.0002312 secs]2022-11-17T17:58:57.091+0800: 1217966.70
5: [class unloading, 0.0450819 secs]2022-11-17T17:58:57.136+0800: 1217966.750: [scrub symbol table, 0.0198218 secs]2022-11-17T17:58:57.156+0800: 121796
6.770: [scrub string table, 0.0021679 secs][1 CMS-remark: 2256888K(2658304K)] 2885910K(4040704K), 0.1613101 secs] [Times: user=0.26 sys=0.00, real=0.16
secs]
2022-11-17T17:58:57.159+0800: 1217966.773: [CMS-concurrent-sweep-start]
2022-11-17T17:58:58.567+0800: 1217968.181: [CMS-concurrent-sweep: 1.398/1.409 secs] [Times: user=1.44 sys=0.00, real=1.41 secs]
2022-11-17T17:58:58.567+0800: 1217968.182: [CMS-concurrent-reset-start]
2022-11-17T17:58:58.576+0800: 1217968.190: [CMS-concurrent-reset: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
STW的两个阶段,共耗时 0.19s
- 初始标记 CMS Initial Mark: 耗时0.03s
- 重新标记 CMS Final Remark: 耗时0.16s
并发阶段,耗时
- 并发标记 CMS-concurrent-mark: 耗时0.51s
- 并发预清理 CMS-concurrent-preclean:耗时0.01s
- 并发可中断预清理 CMS-concurrent-abortable-preclean: 耗时5.36s
- 并发清理 CMS-concurrent-sweep:1.41s
告警和监控的问题
经过对gc日志的分析,发现,实际暂停时间和告警内容不符,查看监控,也发现了这个问题
和运维确认后,监控和告警的指标,并不是gc暂停时间,而是gc总时长。
因此,运维侧,需要优化,告警、监控相关说明
监控最好能够添加暂停时长的指标 @运维同鞋
有内存不足的表现
查看堆监控发现,
- eden去和s区经常被打满,对象进入老年代
- 使得fullgc较为频繁
- 没有内存泄露的表现(没有持续增长,无法回收),不需要再dump分析
老年代够用,但新生代小了
结论与方案
- gc暂停时间并不长,只是gc整个过程较长
- 告警中的时间并不是gc暂停时间,运维侧需要调整告警内容与监控的指标,细化暂停时长的监控@运维同鞋
- 堆内存,老年代够用,但新生代小了,建议老年代保持不变,新生代加内存,
- 当前是堆内存共4GB,新生代为1500m
- 新生代加到2500m,然后持续观察