Skip to content

Client side consumer recovery #1043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 27, 2023
Merged

Client side consumer recovery #1043

merged 16 commits into from
Nov 27, 2023

Conversation

scottf
Copy link
Contributor

@scottf scottf commented Nov 18, 2023

The PR addresses consumer recovery when the client is disconnected from a server or when the stream or consumer leader is affected by an outage. At least the following situations are covered

  1. Durable Push Consumer
  2. Ephemeral Push Consumer with a long enough inactive threshold to survive a server outage or disconnection
  3. Ordered Push Consumer
  4. Durable Pull Consumer when used with endless Simplification API
  5. Ephemeral Pull Consumer with a long enough inactive threshold to survive a server outage or disconnection when used with endless Simplification API
  6. Ordered Pull Consumer with endless Simplification

@scottf scottf changed the title Robust simplification Client side consumer recovery Nov 24, 2023
lastMsgReceived.set(System.currentTimeMillis());
}

class MmTimerTask extends TimerTask {
Copy link
Contributor Author

@scottf scottf Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was extracted from anonymous so the timer could be re-used. Reduce the number of objects created during error conditions since the object can be reused instead of a new one made.

heartbeatTimer.schedule(heartbeatTimerTask, alarmPeriodSetting, alarmPeriodSetting);
updateLastMessageReceived();
Copy link
Contributor Author

@scottf scottf Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above code block reflects extracted MmTimerTask and the ability to reuse existing timer

}
}

protected void shutdownHeartbeatTimer() {
synchronized (stateChangeLock) {
if (heartbeatTimer != null) {
heartbeatTimerTask.shutdown();
heartbeatTimerTask = null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the task is null it won't be re-used.

@@ -230,7 +230,7 @@ public CompletableFuture<Boolean> drain(Duration timeout) throws InterruptedExce
}

/**
* @return whether or not this consumer is still processing messages. For a
* @return whether this consumer is still processing messages. For a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed grammar

@@ -68,7 +68,7 @@ public class NatsConsumerContext implements ConsumerContext, SimplifiedSubscript
.replayPolicy(config.getReplayPolicy())
.headersOnly(config.getHeadersOnly())
.build();
subscribeSubject = originalOrderedCc.getFilterSubject();
subscribeSubject = Validator.validateSubject(originalOrderedCc.getFilterSubject(), false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validating in the constructor because it comes through here from a different code path

PullSubscribeOptions pso;
if (ordered) {
if (lastConsumer != null) {
highestSeq = Math.max(highestSeq, lastConsumer.pmm.lastStreamSeq);
}
ConsumerConfiguration cc = lastConsumer == null
? originalOrderedCc
: streamCtx.js.nextOrderedConsumerConfiguration(originalOrderedCc, highestSeq, null);
: streamCtx.js.consumerConfigurationStartAfterLast(originalOrderedCc, highestSeq, null, null);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more descriptive name

public void heartbeatError() {
stopped.set(true);
finished.set(true);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this handles failure in the middle of a fetch.

boolean isAutoAck
) throws IOException, JetStreamApiException {
boolean isAutoAck,
PullMessageManager pmmInstance) throws IOException, JetStreamApiException {
Copy link
Contributor Author

@scottf scottf Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allows me to use the existing PullMessageManager which has state, instead of making a new one.

ConsumerConfiguration originalCc,
long lastStreamSeq,
String newDeliverSubject)
String newDeliverSubject, String consumerName)
Copy link
Contributor Author

@scottf scottf Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the ephemeral consumer did not have a name, the server generated one. This gets passed in now since it's not part of the originalCc. Also renamed, hopefully more descriptive

resetTracking();
if (pullManagerObserver != null) {
pullManagerObserver.heartbeatError();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just simplified - the manager doesn't really need to do all this stuff like it did with push ordered. Now we just let the observer know there was a problem and it handles it.

@Override
protected Boolean beforeQueueProcessorImpl(NatsMessage msg) {
messageReceived(); // record message time. Used for heartbeat tracking
Copy link
Contributor Author

@scottf scottf Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now handled in trackIncoming

@@ -40,6 +40,7 @@ public class Status {
public static String EXCEEDED_MAX_REQUEST_MAX_BYTES = "Exceeded MaxRequestMaxBytes"; // 409

public static String BATCH_COMPLETED = "Batch Completed"; // 409 informational
public static String SERVER_SHUTDOWN = "Server Shutdown"; // 409 informational with headers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new one to me. Handling it now.

@scottf scottf requested a review from piotrpio November 26, 2023 23:12

private void setupHbAlarmToTrigger() {
pmm.resetTracking();
pmm.initOrResetHeartbeatTimer();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever there is some sort of exception, instead of dying, just turn the hb alarm on. It will fire because there are no incoming messages. Do this instead of do it synchronously and also so we don't try doing the same thing that just failed right away.

@@ -218,7 +218,7 @@ public static byte[] makeData(String prefix, int msgSize, boolean verbose, int x
return null;
}

String text = prefix + "-" + x + ".";
String text = prefix + "-" + x;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some example stuff, not production code, always tweaking things because I'm obsessive

* Gets the consumer name that was used to create the context.
* @return the consumer name
*/
String getConsumerName();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to make the consumer name available for pull ordered consumers. This helps reporting, and underneath I will maintain the name on error. I originally didn't want the name exposed, but it's exposed on push ordered, and of course can be seen by management api/CLI. It's intended for visibility and reporting

* @return a JetStream instance.
* @throws IOException various IO exception such as timeout or interruption
*/
JetStream jetStream() throws IOException;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tired of not being able to get a JetStream context from a JetStreamManagement context.

@@ -35,13 +38,13 @@ public enum ManageResult {MESSAGE, STATUS_HANDLED, STATUS_TERMINUS, STATUS_ERROR

protected long lastStreamSeq;
protected long lastConsumerSeq;
protected long lastMsgReceived;
protected AtomicLong lastMsgReceived;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic is threadsafe. Might be overkill, but probably not.

PullSubscribeOptions pso;
if (ordered) {
if (lastConsumer != null) {
highestSeq = Math.max(highestSeq, lastConsumer.pmm.lastStreamSeq);
}
ConsumerConfiguration cc = lastConsumer == null
? originalOrderedCc
: streamCtx.js.nextOrderedConsumerConfiguration(originalOrderedCc, highestSeq, null);
: streamCtx.js.consumerConfigurationForOrdered(originalOrderedCc, highestSeq, null, null);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflects name and signature (internal) change

pso = new OrderedPullSubscribeOptionsBuilder(streamCtx.streamName, cc).build();
}
else {
pso = unorderedBindPso;
}

if (messageHandler == null) {
return (NatsJetStreamPullSubscription) streamCtx.js.subscribe(subscribeSubject, pso);
return (NatsJetStreamPullSubscription) streamCtx.js.createSubscription(
subscribeSubject, null, pso, null, null, null, false, optionalPmm);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflects need to pass existing PullMessageManager (pmm) to re-use with new subscriptions since pmm has state

}
else {
mm = pmmInstance;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where I reuse the pmm instead of making a new one.

@@ -565,7 +573,7 @@ private String lookupStreamSubject(String stream) throws IOException, JetStreamA
@Override
public JetStreamSubscription subscribe(String subscribeSubject) throws IOException, JetStreamApiException {
subscribeSubject = validateSubject(subscribeSubject, true);
return createSubscription(subscribeSubject, null, null, null, null, null, false);
return createSubscription(subscribeSubject, null, null, null, null, null, false, null);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these method calls reflect internal signature change. These are brand new subscriptions so don't have a pmm

consumerCreate290Available = impl.consumerCreate290Available;
multipleSubjectFilter210Available = impl.multipleSubjectFilter210Available;
}

Copy link
Contributor Author

@scottf scottf Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation to make a JetStream context from a JetStreamManagement context

protected final int thresholdMessages;
protected final long thresholdBytes;
protected final SimplifiedSubscriptionMaker subscriptionMaker;
protected final Dispatcher userDispatcher;
protected final MessageHandler userMessageHandler;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making the PullRequestOptions on the fly, so need to keep the original user options, etc. around.

try {
jsm.deleteConsumer(stream, actualConsumerName);
}
catch (Exception ignore) {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on failure of (push) ordered, I want to re-use the same consumer name, but it's likely that the original consumer I was using is still around, I don't know why the ordered error was triggered. But because you cannot just update an existing consumer to change deliver policy/start sequence, I have to make an entire new consumer. So I just make sure I delete the old consumer. The times its does exception will most likely be that it actually was deleted and deleteConsumer threw a JSApi exception

throw ise;
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was based on the Push version where I had to maintain the same subscription handle that was given to the user, but it's done differently here and this is just more complicated since the subscription handle is kept from the user because they are using the simplification handle.

@@ -15,4 +15,5 @@

interface PullManagerObserver {
void pendingUpdated();
void heartbeatError();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs this because the manager can't be responsible for restarts because the PMM is only managing messages, not the subscriptions / state getting messages.

Copy link
Member

@wallyqs wallyqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@scottf scottf merged commit a4b50a9 into main Nov 27, 2023
@scottf scottf deleted the robust-simplification branch November 27, 2023 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy