Kafka知识点

news2025/7/6 9:18:14

消费者组
kafka的消费者组里面包含一个或多个消费者实例，它们共享一个公共的 ID，这个 ID 被称为 Group ID。一个消费者组可以订阅多个主题，但是同一个消费者组里面的一个实例只能消费一个主题里面的一个分区。

consumer group
A kafka consumer group contains one or more consumer instances that share a common ID, called the Group ID. A consumer group can subscribe to multiple topics, but an instance in the same consumer group can only consume one partition in a topic.
一个Group组里应该有多少个Consumer实例呢？
理想情况下，Consumer实例的个数=该Group组订阅的所有topic的分区总和。

How many Consumer instances should there be in a Group?
Ideally, the number of Consumer instances is equal to the sum of the partitions of all topics subscribed to by the Group.
Consumer Group 何时进行 Rebalance（平衡）呢？
（1）组成员数发生变更。比如有新的 Consumer 实例加入组或者离开组，抑或是有 Consumer 实例崩溃被“踢出”组。
（2）订阅主题数发生变更。Consumer Group 可以使用正则表达式的方式订阅主题，比如 consumer.subscribe(Pattern.compile(“t.*c”)) 就表明该 Group 订阅所有以字母 t 开头、字母 c 结尾的主题。在 Consumer Group 的运行过程中，你新创建了一个满足这样条件的主题，那么该 Group 就会发生 Rebalance。
（3）订阅主题的分区数发生变更。Kafka 当前只能允许增加一个主题的分区数。当分区数增加时，就会触发订阅该主题的所有 Group 开启 Rebalance。
位移主题（__consumer_offsets）无限膨胀占用过多磁盘空间的问题
建议去检查一下 Log Cleaner 线程的状态，通常都是这个线程挂掉了导致的。
Kafka 提供了专门的后台线程定期地巡检待 Compact 的主题，看看是否存在满足条件的可删除数据。这个后台线程叫 Log Cleaner。

The topic__consumer_offsets is the problem of infinite expansion taking up too much disk space
It is recommended to check the status of the Log Cleaner thread, which is usually caused by the thread hanging.
Kafka provides a dedicated background thread that periodically polls the topics to be Compact to see if there is deletable data that meets the criteria. This background thread is called Log Cleaner.
如何避免消费端的rebalance？
Rebalance的弊端：
（1）Rebalance 影响 Consumer 端 TPS，在 Rebalance 期间，Consumer 会停下手头的事情，什么也干不了。
（2）Rebalance 很慢。如果你的 Group 下成员很多，就一定会有这样的痛点。例如国外的一个例子：他的 Group 下有几百个 Consumer 实例，Rebalance 一次要几个小时。在那种场景下，Consumer Group 的 Rebalance 已经完全失控了。
（3）Rebalance 效率不高。当前 Kafka 的设计机制决定了每次 Rebalance 时，Group 下的所有成员都要参与进来，而且通常不会考虑局部性原理，但局部性原理对提升系统性能是特别重要的。
kafka的副本机制
kafka里面，副本（replace）是在分区级别下面，分为leader副本和follow副本，follow副本是不对外提供读写的，只有leader副本才提供。另外在连接kafka的borker的时候，都会连接leader副本所在的broker进行读写。
这种副本机制的好处：
（1）方便实现“Read-your-writes”（意思是写一条消息立马就能看到）
（2）方便实现单调读（意思是在消费消息的时候，不会出现消息一会存在，一会不存在的情况）

kafka 's copy mechanism
In kafka, the copy (replace) is under the partition level, which is divided into leader copy and follow copy. The follow copy does not provide read and write to the outside, only the leader copy provides it. In addition, when connecting to the borker of kafka, it will connect to the broker where the leader copy is located for reading and writing.
Benefits of this copy mechanism:
(1) Easy to Read-your-writes
(2) It is convenient to realize monotonous reading (meaning that when consuming messages, the situation that messages exist for a while and disappear for a while will not occur)
ISR副本集合（ In-sync Replicas）
ISR 中的副本都是与 Leader 同步的副本，相反，不在 ISR 中的追随者副本就被认为是与 Leader 不同步的。ISR不只是follow副本集合，还有leader副本。特殊情况下，ISR里面只有一个leader副本。

ISR Replicas ( In-sync Replicas)
Replicas in the ISR are all synchronized with the Leader, whereas follower replicas that are not in the ISR are considered to be out of sync with the Leader. ISR is not just a collection of follow replicas, but also a leader replica. In special cases, there is only one copy of the leader in the ISR.
Kafka判断Follower与leader同步的标准：
标准是broker端参数：replica.lag.time.max.ms
这个参数含义是，Follower副本能落后leader副本的最长时间间隔，默认是10s。

Kafka's criteria for determining synchronization between Follower and Leader:
The standard is broker side parameters: replica.lag.time.max.ms
This parameter means the maximum time interval that the Follower copy can lag behind the leader copy. The default value is 10s.
如果ISR中的leader副本也挂掉，ISR为空，怎么选举新leader？
Kafka把所有不在ISR中的副本成为非同步副本，正常来说，非同步副本跟老的leader副本数据会相差较多，选这些副本做leader副本，会造成消息丢失，但是会保证分区可用性。
从非同步副本选举leader，要在broker端开启unclean.leader.election.enable参数。该参数控制是否允许unclean领导者选举。
但是不建议开启该参数，禁用该参数可以维护数据的一致性。
kafka控制器
kafka控制器作用：在zk的帮助下，管理和协调整个kafka集群。
控制器是如何被选出来的？
Broker在启动时，会尝试去ZooKeeper中创建/controller节点。Kafka当前选举控制器的规则是：第一个成功创建/controller节点的Broker会被指定为控制器。
控制器是干什么的？
1.主题管理（创建、删除、增加分区）
2.分区重分配（kafka-reassign-partitions 脚本）
3.Preferred 领导者选举
4.集群成员管理（新增 Broker、Broker 主动关闭、Broker 宕机）
5.数据服务

控制器的故障转移（kafka的故障转移）

当Broker0宕机后，ZooKeeper通过Watch机制感知到并删除了/controller临时节点。之后所有存活的Broker开始竞选新的控制器身份。Broker3最终赢得了选举，成功地在ZooKeeper上重建了/controller 节点。之后Broker3会从ZooKeeper中读取集群元数据信息，并初始化到自己的缓存中。至此，控制器的Failover完成，可以行使正常的工作职责了。

控制器的故障转移（kafka的故障转移）

当Broker0宕机后，ZooKeeper通过Watch机制感知到并删除了/controller临时节点。All surviving brokers then begin to campaign for a new controller identity. Broker 3 won the election and successfully rebuilt the/controller node on ZooKeeper. Broker 3 then reads the cluster metadata information from ZooKeeper and initializes it into its own cache. At this point, the failover of the controller is completed, and the controller can perform normal work duties.
kafka消费者位移
重设位移策略：
（1）位移维度
（2）时间维度

kafka consumer displacement
Reset displacement strategy:
(1) Displacement dimension
(2) Time dimension

Earliest 策略直接指定–to-earliest
bin/kafka-consumer-groups.sh --bootstrap-server kafka-host:port --group test-group --reset-offsets --all-topics --to-earliest –execute

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/74408.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！

Kafka知识点

相关文章

web课程设计网页规划与设计(HTML+CSS+JavaScript仿悦世界游戏官网 6个页面)

web前端基于html实现花店购物网站源码(HTML+CSS+JavaScript) 企业网站制作

央国企信创全面加速，如何将数字化转型和信创深度融合？

Linux权限的理解

JVM 篇之牛刀小试（二）（PS:之前请教我的小伙子校招去了字节～）

C++ 类与对象（四）构造函数2、static成员、友元、内部类

Nacos配置中心

窗帘轨道怎么安装？方法有哪些？-江南爱窗帘十大品牌

SpringBoot实战：整合MyBatis搭建基本骨架

CSS 实现鼠标hover 展示内容

TF1-项目搭建配置及用户登录

大分子葡聚糖-牛血清蛋白抗原偶联物(Dextran-BSA)|多糖-蛋白偶联物

计算机毕业设计php+vue基于微信小程序的贵小团校园社团小程序

Linux内核与SMP（对称多处理）

两行代码自动压缩ViT模型！模型体积减小3.9倍，推理加速7.1倍

嵌入式：ARM处理器的工作状态

微信小程序实现圆形菜单弹出选中动画

RISC-V SIG 推出基于openEuler 的下游发行版 Eulaceura

【数据结构】队列的实现

用技术记录世界杯2022