1、场景
最近在给上云项目部署系统,通过压测都已经正式上生产后发现kafka存在异常错误,经排查发现zookeeper也存在错误,怀疑kafka的问题可能是由于zk异常到的,报错如下
2022-11-17 06:26:43,052 [myid:] - WARN [NIOWorkerThread-1:NIOServerCnxn@380] - Close of session 0x0
java.io.IOException: Len error. A message from /172.26.91.147:60500 with advertised length of 1195725856 is either a malformed message or too large to proc
ess (length is greater than jute.maxbuffer=1048575)
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:549)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
2022-11-17 06:26:48,687 [myid:] - WARN [NIOWorkerThread-2:NIOServerCnxn@380] - Close of session 0x0
java.io.IOException: Len error. A message from /172.26.91.147:60506 with advertised length of 1195725856 is either a malformed message or too large to proc
ess (length is greater than jute.maxbuffer=1048575)
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:549)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
排查过程如下------------------>
2、原因
关键性错误描述
java.io.IOException: Len error. A message from /172.26.91.147:60506 with advertised length of 1195725856 is either a malformed message or too large to proc
ess (length is greater than jute.maxbuffer=1048575)
来自/1772.26.91.147:60500、通告长度为1195725856的消息格式不正确或太大,无法处理(长度大于jut.maxbuffer=1048575
客户端发送的包太大,超过了zookeeper的jute.maxbuffer的设置,默认大小为1048575。
处理方法:修改jvm内存参数jute.maxbuffer大小调整到50M=512000KB=524288000Bytes(默认的大约1M)
-
修改bin/zkServer.sh或者zkEnv.sh
JVMFLAGS="$JVMFLAGS -Djute.maxbuffer=524288000"
-
由于这里是自己打的arm架构的镜像无法直接修改启动脚本文件
zookeeper默认内置了修改jvm内存参数的启动文件,只是默认改文件并不存在
- 查看
zkEnv.sh
脚本文件可以看到我们可以通过配置java.env文件来自定义设置jvm内存参数
3、结论
- 通过添加
java.env
配置文件,挂载到容器中的相关目录下(我这里是:/opt/zookeeper/conf/java.env)
cat > java.env << EOF
export JVMFLAGS="-Xms1024m -Xmx1024m -Djute.maxbuffer=5000000000 $JVMFLAGS"
EOF
- 通过修改不同的jvm内存参数来判断配置文件是否被读取