Handle Large Messages in Apache Kafka
Handle Large Messages in Apache Kafka
Apache Kafka
Jiangjie (Becket) Qin @ LinkedIn
Producer Broker
{
…
if (message.size > message.max.bytes)
RecordTooLargeException reject!
…
}
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)
Data
Store
Data
Store
data
Ref.
Data
Store
data
Ref.
Data
Store
data
Ref.
Data
Store
data data
Ref. Ref.
Works fine as long as the durability of the data store can be guaranteed.
Data
Store
data data
Ref. Ref.
Data
Store
data data
Ref. Ref.
End to end latency Depend on the external storage The latency of Kafka
Client complexity Need to deal with envelopes Much more involved (coming soon)
A normal-sized
message is sent as
a single-segment
message.
Client Modules
Producer
MessageSplitter
KafkaProducer<byte[], byte[]>
Compatible interface
with open source Kafka Kafka brokers
Consumer
producer / consumer
KafkaConsumer<byte[], byte[]>
MessageAssembler
LargeMessageBufferPool
DeliveredMessageOffsetTracker
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
A closer look at large message handling
● The offset of a large message
● Offset tracking
● Producer callback
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery
0: msg0-seg0 0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery
0: msg0-seg0 0: msg0-seg0
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery
3: msg0-seg1
Broker Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery (Need to buffer all the message segments until the current
large message is complete)
0: msg0-seg0 0: msg0-seg0
0: msg0
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
1: msg1
3: msg0-seg1 3: msg0-seg1
seek to 0
0: msg0-seg0 0: msg0-seg0 The consumer can simply
0: msg0 seek to the message offset.
seek to 1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
1: msg1
3: msg0-seg1 3: msg0-seg1
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
2: msg1-seg1 2: msg1-seg1
3: msg0-seg1
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Offset tracking
Broker Consumer User
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
commit( Map{(tp->2)} )
● We cannot commit offset 2 because m0-s0 hasn’t been
5: msg3-seg0 delivered to the user.
● We should commit offset 0 so there is no message loss.
...
Offset tracking
Broker Consumer User
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Offset tracking
Broker Consumer User
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1
4: msg2-seg0
seek(tp, 2)
● seek to m1-s0, i.e offset 1 instead of offset 2
5: msg3-seg0
...
Offset tracking
Broker Consumer User
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1
0: msg0-seg0 0: msg0-seg0 All the segments will be sent to the same partition.
1: msg0-seg1 {
numSegments=3
2: msg0-seg2 ackedSegments=1;
userCallback;
... }
0: msg0-seg0 0: msg0-seg0 All the segments will be sent to the same partition.
1: msg0-seg1 1: msg0-seg1 {
numSegments=3
2: msg0-seg2 ackedSegments=2;
userCallback;
... }
0: msg0-seg0 0: msg0-seg0 All the segments will be sent to the same partition.
1: msg0-seg1 1: msg0-seg1 {
numSegments=3
2: msg0-seg2 2: msg0-seg2 ackedSegments=3;
userCallback;
... }
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
0: msg0-seg0 0: msg0-seg0
1: msg1-seg0
New owner consumer 1 resumes reading from msg0-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Rebalance and duplicates handling
Broker Consumer 1 User
0: msg0-seg0 0: msg0-seg0
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Rebalance and duplicates handling
Broker Consumer 1 User
0: msg0-seg0 0: msg0-seg0
2: msg1 Duplicate
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0-seg1
5: msg3-seg0
...
Rebalance and duplicates handling
Broker Consumer 0 User
delivered=2
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
1: msg1-seg0
4: msg2-seg0
5: msg3-seg0
...
Rebalance and duplicates handling
Broker Consumer 1 User
delivered=2
0: msg0-seg0 0: msg0-seg0
1: msg1-seg0 1: msg1-seg0
4: msg2-seg0
5: msg3-seg0
...
Rebalance and duplicates handling
Broker Consumer 1 User
delivered=2
0: msg0-seg0 0: msg0-seg0
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
● msg1.offset <= delivered
5: msg3-seg0 ● Consumer 1 will NOT deliver msg1 again to the user
...
Rebalance and duplicates handling
Broker Consumer 1 User
delivered=2
0: msg0-seg0 0: msg0-seg0
3: msg0
1: msg1-seg0 1: msg1-seg0
2: msg1-seg1 2: msg1-seg1
3: msg0-seg1 3: msg0-seg1
4: msg2-seg0
The first message delivered to user will be msg0 whose offset is 3
5: msg3-seg0
...
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Memory management
● Producer
○ No material change to memory overhead except splitting and copying the message.
● Consumer side
○ buffer.capacity
■ The users can set maximum bytes to buffer the segments. If buffer is full, consumers
evict the oldest incomplete message.
○ expiration.offset.gap
■ Suppose a message has starting offset X and the consumer is now consuming from
offset Y.
■ The message will be removed from the buffer if Y - X is greater than the expiration.
offset.gap. i.e. “timeout”.
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Performance Overhead
● Potentially additional segment serialization/deserialization cost
○ Default segment serde is cheap
{
// segment fields
public final UUID messageId;
public final int sequenceNumber;
public final int numberOfSegments;
public final int messageSizeInBytes;
public final ByteBuffer payload;
}
ProducerRecord
Segment
Serialization Maybe split
serializer
Kafka
Segment
Deserialization Re-assemble
deserializer
Performance Overhead
● Additional memory footprint in consumers
○ Buffer for segments of incomplete large messages
○ Additional memory needed to track the message offsets.
■ 24 bytes per message. It takes 12 MB to track the most recent 5000 messages from 100
partitions.
■ We can choose to only track large messages if users are trustworthy.
ProducerRecord
Segment
Serialization Maybe split
serializer
Kafka
Segment
Deserialization Re-assemble
deserializer
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Compatibility with existing messages
Broker Consumer
1: msg1-seg0 NotLargeMessageSegmentException
...
1: m1(key=”k0-0”) 3: m0(key=”k0-1”)
Note that consumer won’t assemble m0-
2: m1(key=”k0-1”) ... seg1 and m1-seg0 together because
their messageId are different
3: m0(key=”k0-1”)
...
Failure Scenario
Before compaction (Doesn’t work)
Summary
● Reference based messaging works in most cases.
● Sometimes it is handy to have in-line support for large message
○ Sporadic large messages
○ low latency
○ Small number of interleaved large messages
○ Save cost
Acknowledgements
Thanks for the great help and support from
Dong Lin
Joel Koshy
Kartik Paramasivam
Onur Karaman
Yi Pan
LinkedIn Espresso and Datastream team
Q&A