前言
这是之前 我们测试环境出现的一个问题
一个项目, 代码调整了之后, 发布到测试环境 之后, 几分钟之后 整个系统访问这个服务 出现了 "503 Service Unavailable", 然后 当时的处理方式为 临时重启服务
但是过了一会儿之后 同样的问题还是会出现, 导致 前端服务调试不了
这个我在调试的时候, 没有怎么查看日志, 因为一些信息已经还是比较明显了, 这里会贴出 具体的日志
假设这个服务为 ServiceA
jstat 的日志
这是主要的出问题的日志
从日志上来看, 短时间内 频繁的出现了, 多次 full gc
从情况上来看, 是真实业务所需要的内存大于 jvm 的内存, 然后 导致 jvm 内存撑满了之后, 持续的进行 gc, 但是 多次 gc 之后, 仍然没有 释放出需要的空间
然后 进而 恶性循环, 此时 当前 jvm, 不断地进行 full gc, 然后 处理不了业务代码
导致程序 hang 住, 不能处理 外部的请求
78.13 0.00 45.42 91.30 93.67 91.72 428 3.121 17 3.977 7.098 Allocation Failure No GC
78.13 0.00 96.36 91.30 93.67 91.72 428 3.121 17 3.977 7.098 Allocation Failure No GC
0.00 70.00 9.08 91.30 93.68 91.72 429 3.132 17 3.977 7.110 Allocation Failure No GC
27.50 0.00 56.40 91.31 93.68 91.72 430 3.143 17 3.977 7.120 Allocation Failure No GC
27.50 0.00 56.40 91.31 93.68 91.72 430 3.143 17 3.977 7.120 Allocation Failure No GC
0.00 76.25 60.94 91.32 93.68 91.72 431 3.150 17 3.977 7.127 Allocation Failure No GC
74.38 0.00 22.85 91.33 93.68 91.72 432 3.163 17 3.977 7.140 Allocation Failure No GC
74.38 0.00 43.55 91.33 93.68 91.72 432 3.163 17 3.977 7.140 Allocation Failure No GC
0.00 67.50 93.22 91.34 93.68 91.72 433 3.173 17 3.977 7.151 Allocation Failure No GC
0.00 67.50 93.22 91.34 93.68 91.72 433 3.173 17 3.977 7.151 Allocation Failure No GC
0.00 99.35 0.00 98.05 93.68 91.72 435 3.198 18 3.977 7.175 Allocation Failure Ergonomics
99.20 0.00 12.25 88.01 92.90 90.64 436 3.238 18 4.396 7.634 Allocation Failure No GC
99.20 0.00 12.25 88.01 92.90 90.64 436 3.238 18 4.396 7.634 Allocation Failure No GC
0.00 0.00 100.00 99.79 92.90 90.65 437 3.260 25 6.356 9.616 Allocation Failure Ergonomics
0.00 0.00 47.14 99.99 92.90 90.65 437 3.260 27 7.279 10.540 Allocation Failure No GC
0.00 0.00 100.00 99.97 92.90 90.65 437 3.260 34 9.008 12.268 Allocation Failure Ergonomics
0.00 0.00 66.12 75.01 92.90 90.65 437 3.260 34 9.229 12.489 Allocation Failure No GC
0.00 99.99 26.89 80.82 92.90 90.65 439 3.298 34 9.229 12.527 Allocation Failure No GC
51.31 0.00 100.00 99.93 92.90 90.65 446 3.497 36 9.706 13.203 Allocation Failure Ergonomics
0.00 0.00 100.00 99.97 92.90 90.65 446 3.497 47 12.625 16.123 Allocation Failure Ergonomics
0.00 0.00 100.00 99.97 92.90 90.65 446 3.497 58 15.427 18.924 Allocation Failure Ergonomics
0.00 0.00 100.00 99.95 92.90 90.65 446 3.497 69 18.251 21.748 Allocation Failure Ergonomics
0.00 0.00 100.00 99.88 92.90 90.65 446 3.497 80 21.010 24.507 Allocation Failure Ergonomics
0.00 0.00 100.00 99.87 92.90 90.65 446 3.497 91 23.817 27.314 GCLocker Initiated GC Ergonomics
0.00 0.00 100.00 99.90 92.90 90.65 446 3.497 102 26.689 30.186 Allocation Failure Ergonomics
0.00 0.00 100.00 100.00 92.90 90.65 446 3.497 113 29.501 32.998 GCLocker Initiated GC Ergonomics
0.00 0.00 100.00 99.85 92.90 90.65 446 3.497 125 32.499 35.996 GCLocker Initiated GC Ergonomics
0.00 0.00 100.00 99.87 92.91 90.65 446 3.497 135 35.293 38.790 Allocation Failure Ergonomics
0.00 0.00 98.12 99.90 92.57 90.24 446 3.497 142 38.310 41.807 Allocation Failure No GC
0.00 0.00 100.00 99.97 92.57 90.24 446 3.497 142 38.310 41.807 Allocation Failure No GC
0.00 0.00 100.00 99.94 92.57 90.24 446 3.497 146 39.412 42.909 Allocation Failure Ergonomics
0.00 0.00 100.00 99.94 92.57 90.23 446 3.497 153 41.940 45.437 Allocation Failure No GC
0.00 0.00 100.00 99.94 92.57 90.23 446 3.497 153 41.940 45.437 Allocation Failure No GC
0.00 0.00 100.00 99.95 92.58 90.24 446 3.497 161 43.736 47.233 Allocation Failure Ergonomics
0.00 0.00 100.00 99.95 92.58 90.24 446 3.497 167 45.457 48.954 Allocation Failure No GC
0.00 0.00 100.00 99.95 92.58 90.24 446 3.497 167 45.457 48.954 Allocation Failure No GC
0.00 0.00 100.00 99.97 92.58 90.24 446 3.497 178 48.021 51.518 Allocation Failure Ergonomics
0.00 0.00 100.00 99.97 92.58 90.24 446 3.497 189 50.866 54.363 Allocation Failure Ergonomics
0.00 0.00 100.00 99.97 92.58 90.24 446 3.497 189 51.111 54.608 Allocation Failure No GC
0.00 0.00 100.00 99.97 92.58 90.24 446 3.497 194 52.124 55.621 Allocation Failure Ergonomics
0.00 0.00 100.00 99.96 92.58 90.24 446 3.497 205 55.105 58.603 Allocation Failure Ergonomics
0.00 0.00 100.00 99.96 92.58 90.24 446 3.497 210 56.688 60.185 Allocation Failure No GC
0.00 0.00 100.00 99.96 92.58 90.24 446 3.497 210 56.688 60.185 Allocation Failure No GC
0.00 0.00 100.00 99.95 92.58 90.24 446 3.497 220 59.279 62.776 Allocation Failure Ergonomics
0.00 0.00 100.00 100.00 92.58 90.25 446 3.497 229 62.057 65.554 GCLocker Initiated GC Ergonomics
0.00 0.00 99.92 99.95 92.58 90.25 446 3.497 229 62.307 65.804 GCLocker Initiated GC No GC
0.00 0.00 100.00 99.94 92.58 90.25 446 3.497 234 63.286 66.784 GCLocker Initiated GC Ergonomics
0.00 0.00 100.00 99.94 92.58 90.25 446 3.497 246 66.412 69.909 Allocation Failure Ergonomics
0.00 0.00 15.63 74.89 92.58 90.25 446 3.497 249 67.372 70.870 Allocation Failure No GC
0.00 0.00 15.63 74.89 92.58 90.25 446 3.497 249 67.372 70.870 Allocation Failure No GC
0.00 1.23 8.18 78.34 92.69 90.38 449 3.516 249 67.372 70.888 GCLocker Initiated GC No GC
0.00 1.23 17.33 78.34 92.69 90.38 449 3.516 249 67.372 70.888 GCLocker Initiated GC No GC
业务代码的输出日志
一个关键的说明问题的字样是 "GC overhead limit exceeded"
这个同样能够推导出上面的 "jstat -gccause" 的推导的相关
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431)
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at com.baomidou.mybatisplus.core.toolkit.SerializationUtils.serialize(SerializationUtils.java:61)
at com.baomidou.mybatisplus.core.toolkit.support.SerializedLambda.resolve(SerializedLambda.java:58)
at com.baomidou.mybatisplus.core.toolkit.LambdaUtils.lambda$resolve$0(LambdaUtils.java:64)
at com.baomidou.mybatisplus.core.toolkit.LambdaUtils$$Lambda$1195/322612414.get(Unknown Source)
at java.util.Optional.orElseGet(Optional.java:267)
at com.baomidou.mybatisplus.core.toolkit.LambdaUtils.resolve(LambdaUtils.java:63)
at com.baomidou.mybatisplus.core.conditions.AbstractLambdaWrapper.columnToString(AbstractLambdaWrapper.java:62)
at com.baomidou.mybatisplus.core.conditions.AbstractLambdaWrapper.lambda$columnsToString$0(AbstractLambdaWrapper.java:53)
at com.baomidou.mybatisplus.core.conditions.AbstractLambdaWrapper$$Lambda$1193/71871969.apply(Unknown Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.NoClassDefFoundError: Could not initialize class org.apache.ibatis.ognl.OgnlException
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1055)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:909)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:665)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1055)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:909)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:665)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
// 省略部分日志
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431)
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at com.baomidou.mybatisplus.core.toolkit.SerializationUtils.serialize(SerializationUtils.java:61)
at com.baomidou.mybatisplus.core.toolkit.support.SerializedLambda.resolve(SerializedLambda.java:58)
at com.baomidou.mybatisplus.core.toolkit.LambdaUtils.lambda$resolve$0(LambdaUtils.java:64)
at com.baomidou.mybatisplus.core.toolkit.LambdaUtils$$Lambda$1195/322612414.get(Unknown Source)
at java.util.Optional.orElseGet(Optional.java:267)
at com.baomidou.mybatisplus.core.toolkit.LambdaUtils.resolve(LambdaUtils.java:63)
at com.baomidou.mybatisplus.core.conditions.AbstractLambdaWrapper.columnToString(AbstractLambdaWrapper.java:62)
at com.baomidou.mybatisplus.core.conditions.AbstractLambdaWrapper.lambda$columnsToString$0(AbstractLambdaWrapper.java:53)
at com.baomidou.mybatisplus.core.conditions.AbstractLambdaWrapper$$Lambda$1193/71871969.apply(Unknown Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
heap 中的对象分析
我们拿到一份 运行时的 dump 文件, 然后使用最简单的 jhat 进行分析
我们看一下 其中比较占用内存的部分对象, 这一部分是有 业务关联的
首先是 从业务代码相关的实体开始看
一个 entity.RkHeatMapRs 会对应于一个 Integer, 两个 Double
我们路由到 entity.RkHeatMapRs 的整个对象链路
ResultSetImpl
一个对应于 321230 个 ByteArrayRow
一个对应于一个 MysqlTextValueDecoder
一个对应于一个 [[B
一个对应于三个 [B
一个对应于一个 entity.RkHeatMapRs
一个对应于一个 java.lang.Integer
一个对应于两个 java.lang.Double
然后 我们大致统计一下 这里实际的业务场景中多个 entity.RkHeatMapRs 占用的内存相关开销
321230 个 ByteArrayRow = 32 * 321230 = 10M
一个对应于一个 MysqlTextValueDecoder, 16 * 321230 = 5M
一个对应于一个 [[B = 40 * 321233 = 12M
一个对应于三个 [B = 取决于业务数组长度, 24 * 973275 = 23M
一个对应于一个 entity.RkHeatMapRs, 40 * 516380 = 20M
一个对应于一个 java.lang.Integer, 16 * 294759 = 4M
一个对应于两个 java.lang.Double, 24 * 1072068 = 25M
合计 100M 左右
这里是完整的 RkHeatMapRsMapper.query 查询出来的结果
然后 其中的可以回收的对象有 ByteArrayRow, MysqlTextValueDecoder, [B, [[B
所以 经过 gc 之后内存中持续占用的空间有 321230 个 entity.RkHeatMapRs, 和其关联的 321230 个 java.lang.Integer, 和 2 * 321230 个 java.lang.Double, 合计占用空间 50M
然后 当下一次 执行 RkHeatMapRsMapper.query 的时候
内存中合计会占用 原有的 entity.RkHeatMapRs 的相关的 50M
以及新的一个批次查询的 entity.RkHeatMapRs 的相关 50M, 和相关的 ByteArrayRow, MysqlTextValueDecoder, [B, [[B 合计 50M
然后 内存中持有的是 其他的常驻内存的对象, 其他的 staticHolder, 其他的字符串常量, 关联的字节数据 等等
然后 持久需要占用内存 超过了 jvm 可以控制的 new + old
造成了 "GC overhead limit exceeded", 进而 造成了 服务本身的 "503 Service Unavailable"
测试环境的资源情况
出现问题的测试环境资源情况为 容器可用最大内存为 512M, jvm 可用的最大堆内存为 512M
扣除掉 100M 的 metaspace, 堆可用的最大内存为 412M
扣除 case 中硬性需要占用的 entity.RkHeatMapRs 相关的空间 150M
其他可用空间为 262M
其他的业务常驻内存 + 字符串常量常驻内存, 这么一计算 就内存更加匮乏了
为了 更好的复现这个问题, 我将内存更新为了 256M
这样, 复现这个问题 就更加轻松了
问题的解决
增大 容器可用的内存, 以及 jvm 可以使用的最大堆内存
或者 减小一次性加载到内存的业务数据的数量, 比如这里的 entity.RkHeatMapRs
完