One step ahead of the past

博文

这篇文章主要内容是kafka论文里的设计和实现。 1. Kafka Architecture and Design Principles paper section 3 To balance load, a topic is divided into multiple partitions and each broker stores one or more of those partitions. 1.1 单个partition的效率 1.1.1 存储 Each partition of a topic corresponds to a logical log. Physically, a log is implemented as a set of segment files of approximately the same size (e.g., 1GB). broker将新产生的消息添加到最后的segment file，segment file flush到磁盘的操作在两种情况下进行：接收到一定数量的消息过了一定的时间后消息只有在flush到磁盘后才会暴露给消费者。 Each broker keeps in memory a sorted list of offsets, including the offset of the first message in every segment file. 1.1.2 文件传输文中说对于consumer请求的文件，从磁盘传输到socket要用4个步骤 (1) read data from the storage media to the page cache in an OS, (2) copy data in the page cache to an application buffer, (3) copy application buffer to another kernel buffer, (4) send the kernel buffer to the socket 然后其中包括了四次数据copy和两次系统调用，为了效率更高，...

阅读全文

Kafka Demo Record

十二月 02, 2017

这篇博客用来根据师兄做的一个基于Kafka队列的消息产生、传输、存储模型，记录一些关于Kafka基本架构，命令使用，代码开发的东西。另外，还有使用Gobblin工具从kafka队列获取数据保存在本地或者分布式文件系统HDFS和Hive中的过程。

阅读全文

搜索此博客

One step ahead of the past

博文

Fink：使用Table api和SQL

Flink

Kafka paper

Kafka Demo Record