-
消费者组
kafka的消费者组里面包含一个或多个消费者实例,它们共享一个公共的 ID,这个 ID 被称为 Group ID。一个消费者组可以订阅多个主题,但是同一个消费者组里面的一个实例只能消费一个主题里面的一个分区。consumer group
A kafka consumer group contains one or more consumer instances that share a common ID, called the Group ID. A consumer group can subscribe to multiple topics, but an instance in the same consumer group can only consume one partition in a topic. -
一个Group组里应该有多少个Consumer实例呢?
理想情况下,Consumer实例的个数=该Group组订阅的所有topic的分区总和。How many Consumer instances should there be in a Group?
Ideally, the number of Consumer instances is equal to the sum of the partitions of all topics subscribed to by the Group. -
Consumer Group 何时进行 Rebalance(平衡) 呢?
(1)组成员数发生变更。比如有新的 Consumer 实例加入组或者离开组,抑或是有 Consumer 实例崩溃被“踢出”组。
(2)订阅主题数发生变更。Consumer Group 可以使用正则表达式的方式订阅主题,比如 consumer.subscribe(Pattern.compile(“t.*c”)) 就表明该 Group 订阅所有以字母 t 开头、字母 c 结尾的主题。在 Consumer Group 的运行过程中,你新创建了一个满足这样条件的主题,那么该 Group 就会发生 Rebalance。
(3)订阅主题的分区数发生变更。Kafka 当前只能允许增加一个主题的分区数。当分区数增加时,就会触发订阅该主题的所有 Group 开启 Rebalance。 -
位移主题(__consumer_offsets)无限膨胀占用过多磁盘空间的问题
建议去检查一下 Log Cleaner 线程的状态,通常都是这个线程挂掉了导致的。
Kafka 提供了专门的后台线程定期地巡检待 Compact 的主题,看看是否存在满足条件的可删除数据。这个后台线程叫 Log Cleaner。The topic__consumer_offsets is the problem of infinite expansion taking up too much disk space
It is recommended to check the status of the Log Cleaner thread, which is usually caused by the thread hanging.
Kafka provides a dedicated background thread that periodically polls the topics to be Compact to see if there is deletable data that meets the criteria. This background thread is called Log Cleaner. -
如何避免消费端的rebalance?
Rebalance的弊端:
(1)Rebalance 影响 Consumer 端 TPS,在 Rebalance 期间,Consumer 会停下手头的事情,什么也干不了。
(2)Rebalance 很慢。如果你的 Group 下成员很多,就一定会有这样的痛点。例如国外的一个例子:他的 Group 下有几百个 Consumer 实例,Rebalance 一次要几个小时。在那种场景下,Consumer Group 的 Rebalance 已经完全失控了。
(3)Rebalance 效率不高。当前 Kafka 的设计机制决定了每次 Rebalance 时,Group 下的所有成员都要参与进来,而且通常不会考虑局部性原理,但局部性原理对提升系统性能是特别重要的。 -
kafka的副本机制
kafka里面,副本(replace)是在分区级别下面,分为leader副本和follow副本,follow副本是不对外提供读写的,只有leader副本才提供。另外在连接kafka的borker的时候,都会连接leader副本所在的broker进行读写。
这种副本机制的好处:
(1)方便实现“Read-your-writes”(意思是写一条消息立马就能看到)
(2)方便实现单调读(意思是在消费消息的时候,不会出现消息一会存在,一会不存在的情况)
kafka 's copy mechanism
In kafka, the copy (replace) is under the partition level, which is divided into leader copy and follow copy. The follow copy does not provide read and write to the outside, only the leader copy provides it. In addition, when connecting to the borker of kafka, it will connect to the broker where the leader copy is located for reading and writing.
Benefits of this copy mechanism:
(1) Easy to Read-your-writes
(2) It is convenient to realize monotonous reading (meaning that when consuming messages, the situation that messages exist for a while and disappear for a while will not occur)
-
ISR副本集合( In-sync Replicas)
ISR 中的副本都是与 Leader 同步的副本,相反,不在 ISR 中的追随者副本就被认为是与 Leader 不同步的。ISR不只是follow副本集合,还有leader副本。特殊情况下,ISR里面只有一个leader副本。
ISR Replicas ( In-sync Replicas)
Replicas in the ISR are all synchronized with the Leader, whereas follower replicas that are not in the ISR are considered to be out of sync with the Leader. ISR is not just a collection of follow replicas, but also a leader replica. In special cases, there is only one copy of the leader in the ISR.
-
Kafka判断Follower与leader同步的标准:
标准是broker端参数:replica.lag.time.max.ms
这个参数含义是,Follower副本能落后leader副本的最长时间间隔,默认是10s。Kafka's criteria for determining synchronization between Follower and Leader:
The standard is broker side parameters: replica.lag.time.max.ms
This parameter means the maximum time interval that the Follower copy can lag behind the leader copy. The default value is 10s. -
如果ISR中的leader副本也挂掉,ISR为空,怎么选举新leader?
Kafka把所有不在ISR中的副本成为非同步副本,正常来说,非同步副本跟老的leader副本数据会相差较多,选这些副本做leader副本,会造成消息丢失,但是会保证分区可用性。
从非同步副本选举leader,要在broker端开启unclean.leader.election.enable参数。该参数控制是否允许unclean领导者选举。
但是不建议开启该参数,禁用该参数可以维护数据的一致性。
-
kafka控制器
kafka控制器作用:在zk的帮助下,管理和协调整个kafka集群。
控制器是如何被选出来的?
Broker在启动时,会尝试去ZooKeeper中创建/controller节点。Kafka当前选举控制器的规则是:第一个成功创建/controller节点的Broker会被指定为控制器。
控制器是干什么的?
1.主题管理(创建、删除、增加分区)
2.分区重分配(kafka-reassign-partitions 脚本)
3.Preferred 领导者选举
4.集群成员管理(新增 Broker、Broker 主动关闭、Broker 宕机)
5.数据服务
控制器的故障转移(kafka的故障转移)
当Broker0宕机后,ZooKeeper通过Watch机制感知到并删除了/controller临时节点。之后所有存活的Broker开始竞选新的控制器身份。Broker3最终赢得了选举,成功地在ZooKeeper上重建了/controller 节点。之后Broker3会从ZooKeeper中读取集群元数据信息,并初始化到自己的缓存中。至此,控制器的Failover完成,可以行使正常的工作职责了。
控制器的故障转移(kafka的故障转移)
当Broker0宕机后,ZooKeeper通过Watch机制感知到并删除了/controller临时节点。All surviving brokers then begin to campaign for a new controller identity. Broker 3 won the election and successfully rebuilt the/controller node on ZooKeeper. Broker 3 then reads the cluster metadata information from ZooKeeper and initializes it into its own cache. At this point, the failover of the controller is completed, and the controller can perform normal work duties.
-
kafka消费者位移
重设位移策略:
(1)位移维度
(2)时间维度
kafka consumer displacement
Reset displacement strategy:
(1) Displacement dimension
(2) Time dimensionEarliest 策略直接指定–to-earliest
bin/kafka-consumer-groups.sh --bootstrap-server kafka-host:port --group test-group --reset-offsets --all-topics --to-earliest –execute