0% found this document useful (0 votes)
360 views

Handle Large Messages in Apache Kafka

The document discusses handling large messages in Apache Kafka. It summarizes that Kafka limits message sizes to avoid increasing broker memory pressure and slowing performance. It presents reference-based messaging as a workaround, where the producer stores the large message in an external data store and sends a reference to Kafka instead. The document also proposes a solution to support inline large messages by splitting them into chunks and reassembling them at the consumer, while maintaining Kafka's ordering guarantees and offsets.

Uploaded by

Bùi Văn Kiên
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
360 views

Handle Large Messages in Apache Kafka

The document discusses handling large messages in Apache Kafka. It summarizes that Kafka limits message sizes to avoid increasing broker memory pressure and slowing performance. It presents reference-based messaging as a workaround, where the producer stores the large message in an external data store and sends a reference to Kafka instead. The document also proposes a solution to support inline large messages by splitting them into chunks and reassembling them at the consumer, while maintaining Kafka's ordering guarantees and offsets.

Uploaded by

Bùi Văn Kiên
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Handle Large Messages In

Apache Kafka
Jiangjie (Becket) Qin @ LinkedIn

Kafka Meetup - Feb 23, 2016


What is a “large message” ?
● Kafka has a limit on the maximum size of a single message
○ Enforced on the compressed wrapper message if compression is used

Producer Broker

{

if (message.size > message.max.bytes)
RecordTooLargeException reject!

}
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)

Data
Store

Producer Kafka Consumer


Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)

Data
Store
data

Ref.

Producer Kafka Consumer


Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)

Data
Store
data

Ref.

Producer Kafka Consumer


Ref.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)

Data
Store
data

Ref.

Producer Kafka Consumer


Ref. Ref.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)

Data
Store
data data

Ref. Ref.

Producer Kafka Consumer


Ref. Ref.
Reference Based Messaging
● One of our use cases: database replication
○ Unknown maximum row size
○ Strict no data loss
○ Strict message order guarantee

Works fine as long as the durability of the data store can be guaranteed.

Data
Store
data data

Ref. Ref.

Producer Kafka Consumer


Ref. Ref.
Reference Based Messaging
● One of our use cases: database replication
○ Replicates a data store by using another data store....
○ Sporadic large messages
■ Option 1: Send all the messages using reference and take unnecessary overhead.
■ Option 2: Only send large messages using references and live with low storage utilization.
○ Low end to end latency
■ There are more round trips in the system.
■ Need to make sure the data store is fast

Data
Store
data data

Ref. Ref.

Producer Kafka Consumer


Ref. Ref.
In-line Large Message Support
Reference Based Messaging In-line large message support

Operational complexity Two systems to maintain Only maintain Kafka

System stability Depend on : Only depend on Kafka


● The consistency between Kafka
and the external storage
● The durability of external storage

Cost to serve Kafka + External Storage Only maintain Kafka

End to end latency Depend on the external storage The latency of Kafka

Client complexity Need to deal with envelopes Much more involved (coming soon)

Functional limitations Almost none Some limitations


Our solution - chunk and re-assemble

A normal-sized
message is sent as
a single-segment
message.
Client Modules
Producer

MessageSplitter

KafkaProducer<byte[], byte[]>
Compatible interface
with open source Kafka Kafka brokers
Consumer
producer / consumer
KafkaConsumer<byte[], byte[]>

MessageAssembler

LargeMessageBufferPool

DeliveredMessageOffsetTracker
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
A closer look at large message handling
● The offset of a large message
● Offset tracking
● Producer callback
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery

0: msg0-seg0 0: msg0-seg0

1: msg1-seg0

2: msg1-seg1

3: msg0-seg1

Broker Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery

0: msg0-seg0 0: msg0-seg0

1: msg1-seg0 1: msg1-seg0

2: msg1-seg1

3: msg0-seg1

Broker Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery

0: msg0-seg0 0: msg0-seg0 Cannot deliver msg1 until msg0 is delivered.


The consumer has to buffer the msg1.
1: msg1-seg0 1: msg1-seg0
Difficult to handle partially sent messages.
2: msg1-seg1 2: msg1-seg1

3: msg0-seg1

Broker Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery (Need to buffer all the message segments until the current
large message is complete)

0: msg0-seg0 0: msg0-seg0
0: msg0
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1
1: msg1
3: msg0-seg1 3: msg0-seg1

Broker Consumer User


The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery

seek to 0
0: msg0-seg0 0: msg0-seg0 The consumer can simply
0: msg0 seek to the message offset.
seek to 1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1
1: msg1
3: msg0-seg1 3: msg0-seg1

Broker Consumer User


The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details on this soon)
○ Least memory needed for in order delivery
● We chose offset of the last segment
0: msg0-seg0 0: msg0-seg0

1: msg1-seg0

2: msg1-seg1

3: msg0-seg1

Broker Consumer User


The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details on this soon)
○ Least memory needed for in order delivery
● We chose offset of the last segment
0: msg0-seg0 0: msg0-seg0

1: msg1-seg0 1: msg1-seg0

2: msg1-seg1

3: msg0-seg1

Broker Consumer User


The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details on this soon)
○ Least memory needed for in order delivery
● We chose offset of the last segment
0: msg0-seg0 0: msg0-seg0 Deliver msg1 once it completes.
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1

Broker Consumer User


The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details in offset tracking)
○ Least memory needed for in order delivery

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1

Broker Consumer User


The offset of a large message
● We chose offset of the last segment
○ Less memory consumption
○ Better tolerance for partially sent large messages.

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1

Broker Consumer User


A closer look at large message handling
● The offset of a large message
● Offset tracking
● Producer callback
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Offset tracking
Broker Consumer User

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1

4: msg2-seg0

5: msg3-seg0

...
Offset tracking
Broker Consumer User

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1

4: msg2-seg0
commit( Map{(tp->2)} )
● We cannot commit offset 2 because m0-s0 hasn’t been
5: msg3-seg0 delivered to the user.
● We should commit offset 0 so there is no message loss.
...
Offset tracking
Broker Consumer User

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1

4: msg2-seg0

5: msg3-seg0

...
Offset tracking
Broker Consumer User

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1

4: msg2-seg0
seek(tp, 2)
● seek to m1-s0, i.e offset 1 instead of offset 2
5: msg3-seg0

...
Offset tracking
Broker Consumer User

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1
3: msg0
3: msg0-seg1 3: msg0-seg1

4: msg2-seg0 Offset tracker map


{
(2 -> start=1, safe=0),
5: msg3-seg0 (3 -> start=0, safe=4),

... }
● Safe offset - the offset that can be committed without message loss
● Starting offset - the starting offset of a large message.
Offset tracking
● Limitations
○ Consumers can only track the message they have already seen.
■ When the users seek forward, consumer does not check if user is seeking to a message
boundary.
○ Consumers cannot keep track of all the messages they have ever seen.
■ Consumers only track a configured number of recently delivered message for each
partition. e.g. 5,000.
○ After rebalance, the new owner of a partition will not have any tracked message from the
newly assigned partitions.
A closer look at large message handling
● The offset of a large message
● Offset tracking
● Producer callback
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Producer Callback
Producer Broker

0: msg0-seg0 0: msg0-seg0 All the segments will be sent to the same partition.

1: msg0-seg1 {
numSegments=3
2: msg0-seg2 ackedSegments=1;
userCallback;
... }

Do not fire user callback


Producer Callback
Producer Broker

0: msg0-seg0 0: msg0-seg0 All the segments will be sent to the same partition.

1: msg0-seg1 1: msg0-seg1 {
numSegments=3
2: msg0-seg2 ackedSegments=2;
userCallback;
... }

Do not fire user callback


Producer Callback
Producer Broker

0: msg0-seg0 0: msg0-seg0 All the segments will be sent to the same partition.

1: msg0-seg1 1: msg0-seg1 {
numSegments=3
2: msg0-seg2 2: msg0-seg2 ackedSegments=3;
userCallback;
... }

Fire the user callback


● The offset of the last segment is passed to the user callback
● The first exception received is passed to the user callback
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Rebalance and duplicates handling
Broker Consumer 0 User

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1 Offset tracker map


{
(2 -> start=1, safe=0),
4: msg2-seg0 …
}
5: msg3-seg0
Consumer rebalance occurred
...
Note: User has already seen msg1.
Rebalance and duplicates handling
Broker Consumer 0 User

0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1 Offset tracker map


{
(2 -> start=1, safe=0),
4: msg2-seg0 …
}
5: msg3-seg0
Consumer 0 committed offset 0.
...
Note: User has already seen msg1.
Rebalance and duplicates handling
Broker Consumer 1 User

0: msg0-seg0 0: msg0-seg0

1: msg1-seg0
New owner consumer 1 resumes reading from msg0-seg0
2: msg1-seg1

3: msg0-seg1

4: msg2-seg0

5: msg3-seg0

...
Rebalance and duplicates handling
Broker Consumer 1 User

0: msg0-seg0 0: msg0-seg0

1: msg1-seg0 1: msg1-seg0

2: msg1-seg1

3: msg0-seg1

4: msg2-seg0

5: msg3-seg0

...
Rebalance and duplicates handling
Broker Consumer 1 User

0: msg0-seg0 0: msg0-seg0
2: msg1 Duplicate
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1

4: msg2-seg0 Consumer 1 will deliver msg1 again to the user.

5: msg3-seg0

...
Rebalance and duplicates handling
Broker Consumer 0 User
delivered=2
0: msg0-seg0 0: msg0-seg0
2: msg1
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1 Offset tracker map


{
(2 -> start=1, safe=0),
4: msg2-seg0 …
}
5: msg3-seg0
1. Consumer rebalance occurred
2. Consumer 0 committed offset 0 with
...
metadata {delivered=2}
Note: User has already seen msg1.
Rebalance and duplicates handling
Broker Consumer 1 User
delivered=2
0: msg0-seg0 0: msg0-seg0

1: msg1-seg0

2: msg1-seg1 ● New owner consumer 1 resumes reading from msg0-seg0


● Consumer 1 receives the committed metadata {delivered=2}
3: msg0-seg1

4: msg2-seg0

5: msg3-seg0

...
Rebalance and duplicates handling
Broker Consumer 1 User
delivered=2
0: msg0-seg0 0: msg0-seg0

1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 ● New owner consumer 1 resumes reading from msg0-seg0


● Consumer 1 receives the committed metadata {delivered=2}
3: msg0-seg1

4: msg2-seg0

5: msg3-seg0

...
Rebalance and duplicates handling
Broker Consumer 1 User
delivered=2
0: msg0-seg0 0: msg0-seg0

1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1

4: msg2-seg0
● msg1.offset <= delivered
5: msg3-seg0 ● Consumer 1 will NOT deliver msg1 again to the user

...
Rebalance and duplicates handling
Broker Consumer 1 User
delivered=2
0: msg0-seg0 0: msg0-seg0
3: msg0
1: msg1-seg0 1: msg1-seg0

2: msg1-seg1 2: msg1-seg1

3: msg0-seg1 3: msg0-seg1

4: msg2-seg0
The first message delivered to user will be msg0 whose offset is 3
5: msg3-seg0

...
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Memory management
● Producer
○ No material change to memory overhead except splitting and copying the message.
● Consumer side
○ buffer.capacity
■ The users can set maximum bytes to buffer the segments. If buffer is full, consumers
evict the oldest incomplete message.
○ expiration.offset.gap
■ Suppose a message has starting offset X and the consumer is now consuming from
offset Y.
■ The message will be removed from the buffer if Y - X is greater than the expiration.
offset.gap. i.e. “timeout”.
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Performance Overhead
● Potentially additional segment serialization/deserialization cost
○ Default segment serde is cheap
{
// segment fields
public final UUID messageId;
public final int sequenceNumber;
public final int numberOfSegments;
public final int messageSizeInBytes;
public final ByteBuffer payload;
}

ProducerRecord
Segment
Serialization Maybe split
serializer
Kafka
Segment
Deserialization Re-assemble
deserializer
Performance Overhead
● Additional memory footprint in consumers
○ Buffer for segments of incomplete large messages
○ Additional memory needed to track the message offsets.
■ 24 bytes per message. It takes 12 MB to track the most recent 5000 messages from 100
partitions.
■ We can choose to only track large messages if users are trustworthy.

ProducerRecord
Segment
Serialization Maybe split
serializer
Kafka
Segment
Deserialization Re-assemble
deserializer
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Compatibility with existing messages
Broker Consumer

0: msg0 (existing msg) Segment deserializer

1: msg1-seg0 NotLargeMessageSegmentException

2: msg1-seg1 Value deserializer

3: msg2-seg0 (single seg msg) ● When consumers see NotLargeMessageSegmentException,they


will assume the message is an existing message and use value
deserializer to handle it.
...
● Default segment deserializer implementation has handled this.
● In the segment deserializer, user implementation should throw
NotLargeMessageSegmentException
The answer to a question after the meetup
● Does it work for compacted topics?
○ Add suffix “-segmentSeq” to the key
■ It works with a flaw when large messages with the same key do NOT interleave
0: m0(key=”k-0”) 1: m0(key=”k-1”) 2: m0(key=”k-2”) Zombie Segment
1: m0(key=”k-1”) 2: m0(key=”k-2”) ...

2: m0(key=”k-2”) ... 5: m1(key=”k-0”)

... 5: m1(key=”k-0”) 6: m1(key=”k-1”)


Note that consumer won’t assemble segments
5: m1(key=”k-0”) 6: m1(key=”k-1”) ... of m0 with segments of m1 together because
their messageId are different.
6: m1(key=”k-1”) ...

...

Scenario 1 after Scenario 2 after


Before compaction compaction compaction
The answer to a question after the meetup
● Does it work for compacted topics?
○ Add suffix “-segmentSeq” to the key ()
■ It does not work when large messages with the same key may interleave
0: m0(key=”k0-0”) 1: m1(key=”k0-0”)

1: m1(key=”k0-0”) 3: m0(key=”k0-1”)
Note that consumer won’t assemble m0-
2: m1(key=”k0-1”) ... seg1 and m1-seg0 together because
their messageId are different
3: m0(key=”k0-1”)

...

Failure Scenario
Before compaction (Doesn’t work)
Summary
● Reference based messaging works in most cases.
● Sometimes it is handy to have in-line support for large message
○ Sporadic large messages
○ low latency
○ Small number of interleaved large messages
○ Save cost
Acknowledgements
Thanks for the great help and support from

Dong Lin
Joel Koshy
Kartik Paramasivam
Onur Karaman
Yi Pan
LinkedIn Espresso and Datastream team
Q&A

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy