问题
最近上线跑了一个flink任务,运行不久,就会挂掉,初步查看日志报错如下
WARN org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to commit consumer offsets for checkpoint 1
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets.
Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available.
报错原因为The coordinator is not available.
报错在网上搜了一下,根据网友的经验,是消费组协调leader不存在导致
查看kafka __consumer_offsets topic
kafka-topics.sh -bootstrap-server node1:9092,node2:9092,node3:9092,node4:9092,node5:9092 --topic __consumer_offsets --describe
发现确实有Leader:none的
可能原因是 __consumer_offset topic的默认分区是50,但是备份只有1份; kafka集群部署了5台,也就是5个brokers
消费者组连接kafka,并会请求某一台来查找Coordinator(协调者),如果连接的机器上没有备份就不会有Leader,就会出现
找不到Coordinator(协调者)
查看offsets.topic.replication.factor默认配置
cat kafak/config/server.properties | grep offsets.topic.replication.factor
默认值果然只是1;
解决方式
- 1、停止kafka, 修改kafka配置 config/server.properties添加
修改成broker的数量
offsets.topic.replication.factor=3
- 2、删除zookeeper配置信息
./bin/zkCli.sh -server 127.0.0.1:2181
- 删除/config/topics/__consumer_offsets
delete /config/topics/__consumer_offsets
- 删除/brokers/topics/__consumer_offsets
deleteall /brokers/topics/__consumer_offsets
- 3、重启kafka