Kafka paper
To balance load, a topic is divided into multiple partitions and each broker stores one or more of those partitions.
1. Kafka Architecture and Design Principles
paper section 3To balance load, a topic is divided into multiple partitions and each broker stores one or more of those partitions.
1.1 单个partition的效率
1.1.1 存储
Each partition of a topic corresponds to a logical log. Physically, a log is implemented as a set of segment files of approximately the same size (e.g., 1GB).
broker将新产生的消息添加到最后的segment file,segment file flush到磁盘的操作在两种情况下进行:
broker将新产生的消息添加到最后的segment file,segment file flush到磁盘的操作在两种情况下进行:
- 接收到一定数量的消息
- 过了一定的时间后
Each broker keeps in memory a sorted list of offsets, including the offset of the first message in every segment file.
1.1.2 文件传输
(1) read data from the storage
media to the page cache in an OS, (2) copy data in the page cache
to an application buffer, (3) copy application buffer to another
kernel buffer, (4) send the kernel buffer to the socket
1.1.3 stateless broker
broker 不会知道每一个consumer消费了多少消息,新的消息有没有被所有的costumer消费。
1.2 distributed coordination
Each producer can publish a message to
either a randomly selected partition or a partition semantically
determined by a partitioning key and a partitioning function.
each message is delivered to only one of
the consumers within the group.
Our first decision is to make a partition within a topic the smallest
unit of parallelism. This means that at any given time, all
messages from one partition are consumed only by a single
consumer within each consumer group. Had we allowed multiple
consumers to simultaneously consume a single partition, they
would have to coordinate who consumes what messages, which
necessitates locking and state maintenance overhead. In contrast,
in our design consuming processes only need co-ordinate when
the consumers rebalance the load, an infrequent event. In order for
the load to be truly balanced, we require many more partitions in a
topic than the consumers in each group.
The second decision that we made is to not have a central “master” node, but instead let consumers coordinate among themselves in a decentralized fashion.
Kafka uses Zookeeper for the following tasks: (1) detecting the addition and the removal of brokers and consumers, (2) triggering a rebalance process in each consumer when the above events happen, and (3) maintaining the consumption relationship and keeping track of the consumed offset of each partition.
The second decision that we made is to not have a central “master” node, but instead let consumers coordinate among themselves in a decentralized fashion.
Kafka uses Zookeeper for the following tasks: (1) detecting the addition and the removal of brokers and consumers, (2) triggering a rebalance process in each consumer when the above events happen, and (3) maintaining the consumption relationship and keeping track of the consumed offset of each partition.
1.3 delivery guarantees
Kafka guarantees that messages from a single partition are
delivered to a consumer in order. However, there is no guarantee
on the ordering of messages coming from different partitions.