Today, I am comparing why one should really be careful while opting Kafka with Lambda. While Building customer centric solution one might think of opting for an alternative better and cheaper solution. Let’s first look at characteristics of Kafka clusters and later this article will discuss characteristics of Kinesis stream.
Characteristics of a Kafka Cluster:
- Kafka clusters are made up of 4 core components: topics, partitions, brokers, and Zookeeper
- Topics are used to group messages of the same type to simplify access by consumers
- Partitions are data stores that hold topic messages. They can be replicated across several brokers
- Brokers are nodes in a Kafka cluster and can hold multiple partitions across several topics
- Zookeeper is an Apache service that Kafka relies on to coordinate Kafka brokers. This includes leader election, coordination between broker consumers and producers, and broker state tracking
Characteristics of a kinesis stream:
- A single Kinesis stream is equivalent to a topic in Kafka.
- Each Kinesis stream is made up of a configurable number of shards.
- Shards are equivalent to partitions in Kafka terminology.
- Shards allow a steam to be scaled dynamically in response to demand fluctuations. To understand what a shard is, think of a single Kinesis stream as a highway, and each shard is a lane. A Kinesis stream’s throughput can be increased by adding more shards – similar to how a highway’s throughput can be increased by adding more lanes.
- Consumers can attach a partition key to each data sent to Kinesis to group data by shards. This can be very helpful to determine how data is routed when shards are added or removed in a stream.
- The partition key is designed by the stream creator to reflect how the data should be split in case more shards are added.
- It is important to keep in mind that all of this is happening in a single topic/stream. Partition keys are used to determine how data is routed across shards within a single topic/stream.
- Example: A stream has a single shard but 4 producers each attaching their unique partition key to the data when they insert it into the stream. Demand starts low with 1 shard being able to support all 4 producers. When demand increases, three more shards can be added for a total of 4 shards in this stream. Based on the partition key design, Kinesis can map the partition keys to the new shards and each producer will get its own shard.
- A Kinesis stream can have a minimum of 1 shard and a maximum of 50 (actual maximum is region specific).
- Each shard can support up to 2MB/sec data read.
- Each shard can support up to 1,000 writes per second, for a maximum of 1MB/sec.
- Maximum size of a single data blob in a stream is 1MB.
- Default data retention per stream is 24 hours. Increasing this number will increase the per stream cost.
- The data retention can be increased in hourly increments up to a maximum of 7 days.
Amazon Kinesis Streams is a fully managed service that makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. It enables you to cost effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. Apache Kafka is an open-source streaming data solution that you can run on Amazon EC2 to build real-time applications. (AMAZON, https://aws.amazon.com/real-time-data-streaming-on-aws/)