-
Notifications
You must be signed in to change notification settings - Fork 176
Client side consumer recovery #1043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lastMsgReceived.set(System.currentTimeMillis()); | ||
} | ||
|
||
class MmTimerTask extends TimerTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was extracted from anonymous so the timer could be re-used. Reduce the number of objects created during error conditions since the object can be reused instead of a new one made.
heartbeatTimer.schedule(heartbeatTimerTask, alarmPeriodSetting, alarmPeriodSetting); | ||
updateLastMessageReceived(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above code block reflects extracted MmTimerTask and the ability to reuse existing timer
} | ||
} | ||
|
||
protected void shutdownHeartbeatTimer() { | ||
synchronized (stateChangeLock) { | ||
if (heartbeatTimer != null) { | ||
heartbeatTimerTask.shutdown(); | ||
heartbeatTimerTask = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the task is null it won't be re-used.
@@ -230,7 +230,7 @@ public CompletableFuture<Boolean> drain(Duration timeout) throws InterruptedExce | |||
} | |||
|
|||
/** | |||
* @return whether or not this consumer is still processing messages. For a | |||
* @return whether this consumer is still processing messages. For a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed grammar
@@ -68,7 +68,7 @@ public class NatsConsumerContext implements ConsumerContext, SimplifiedSubscript | |||
.replayPolicy(config.getReplayPolicy()) | |||
.headersOnly(config.getHeadersOnly()) | |||
.build(); | |||
subscribeSubject = originalOrderedCc.getFilterSubject(); | |||
subscribeSubject = Validator.validateSubject(originalOrderedCc.getFilterSubject(), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validating in the constructor because it comes through here from a different code path
PullSubscribeOptions pso; | ||
if (ordered) { | ||
if (lastConsumer != null) { | ||
highestSeq = Math.max(highestSeq, lastConsumer.pmm.lastStreamSeq); | ||
} | ||
ConsumerConfiguration cc = lastConsumer == null | ||
? originalOrderedCc | ||
: streamCtx.js.nextOrderedConsumerConfiguration(originalOrderedCc, highestSeq, null); | ||
: streamCtx.js.consumerConfigurationStartAfterLast(originalOrderedCc, highestSeq, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more descriptive name
public void heartbeatError() { | ||
stopped.set(true); | ||
finished.set(true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this handles failure in the middle of a fetch.
boolean isAutoAck | ||
) throws IOException, JetStreamApiException { | ||
boolean isAutoAck, | ||
PullMessageManager pmmInstance) throws IOException, JetStreamApiException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allows me to use the existing PullMessageManager which has state, instead of making a new one.
ConsumerConfiguration originalCc, | ||
long lastStreamSeq, | ||
String newDeliverSubject) | ||
String newDeliverSubject, String consumerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the ephemeral consumer did not have a name, the server generated one. This gets passed in now since it's not part of the originalCc. Also renamed, hopefully more descriptive
resetTracking(); | ||
if (pullManagerObserver != null) { | ||
pullManagerObserver.heartbeatError(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just simplified - the manager doesn't really need to do all this stuff like it did with push ordered. Now we just let the observer know there was a problem and it handles it.
@Override | ||
protected Boolean beforeQueueProcessorImpl(NatsMessage msg) { | ||
messageReceived(); // record message time. Used for heartbeat tracking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now handled in trackIncoming
@@ -40,6 +40,7 @@ public class Status { | |||
public static String EXCEEDED_MAX_REQUEST_MAX_BYTES = "Exceeded MaxRequestMaxBytes"; // 409 | |||
|
|||
public static String BATCH_COMPLETED = "Batch Completed"; // 409 informational | |||
public static String SERVER_SHUTDOWN = "Server Shutdown"; // 409 informational with headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new one to me. Handling it now.
|
||
private void setupHbAlarmToTrigger() { | ||
pmm.resetTracking(); | ||
pmm.initOrResetHeartbeatTimer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whenever there is some sort of exception, instead of dying, just turn the hb alarm on. It will fire because there are no incoming messages. Do this instead of do it synchronously and also so we don't try doing the same thing that just failed right away.
@@ -218,7 +218,7 @@ public static byte[] makeData(String prefix, int msgSize, boolean verbose, int x | |||
return null; | |||
} | |||
|
|||
String text = prefix + "-" + x + "."; | |||
String text = prefix + "-" + x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some example stuff, not production code, always tweaking things because I'm obsessive
* Gets the consumer name that was used to create the context. | ||
* @return the consumer name | ||
*/ | ||
String getConsumerName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to make the consumer name available for pull ordered consumers. This helps reporting, and underneath I will maintain the name on error. I originally didn't want the name exposed, but it's exposed on push ordered, and of course can be seen by management api/CLI. It's intended for visibility and reporting
* @return a JetStream instance. | ||
* @throws IOException various IO exception such as timeout or interruption | ||
*/ | ||
JetStream jetStream() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was tired of not being able to get a JetStream context from a JetStreamManagement context.
@@ -35,13 +38,13 @@ public enum ManageResult {MESSAGE, STATUS_HANDLED, STATUS_TERMINUS, STATUS_ERROR | |||
|
|||
protected long lastStreamSeq; | |||
protected long lastConsumerSeq; | |||
protected long lastMsgReceived; | |||
protected AtomicLong lastMsgReceived; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic is threadsafe. Might be overkill, but probably not.
PullSubscribeOptions pso; | ||
if (ordered) { | ||
if (lastConsumer != null) { | ||
highestSeq = Math.max(highestSeq, lastConsumer.pmm.lastStreamSeq); | ||
} | ||
ConsumerConfiguration cc = lastConsumer == null | ||
? originalOrderedCc | ||
: streamCtx.js.nextOrderedConsumerConfiguration(originalOrderedCc, highestSeq, null); | ||
: streamCtx.js.consumerConfigurationForOrdered(originalOrderedCc, highestSeq, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reflects name and signature (internal) change
pso = new OrderedPullSubscribeOptionsBuilder(streamCtx.streamName, cc).build(); | ||
} | ||
else { | ||
pso = unorderedBindPso; | ||
} | ||
|
||
if (messageHandler == null) { | ||
return (NatsJetStreamPullSubscription) streamCtx.js.subscribe(subscribeSubject, pso); | ||
return (NatsJetStreamPullSubscription) streamCtx.js.createSubscription( | ||
subscribeSubject, null, pso, null, null, null, false, optionalPmm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reflects need to pass existing PullMessageManager (pmm) to re-use with new subscriptions since pmm has state
} | ||
else { | ||
mm = pmmInstance; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where I reuse the pmm instead of making a new one.
@@ -565,7 +573,7 @@ private String lookupStreamSubject(String stream) throws IOException, JetStreamA | |||
@Override | |||
public JetStreamSubscription subscribe(String subscribeSubject) throws IOException, JetStreamApiException { | |||
subscribeSubject = validateSubject(subscribeSubject, true); | |||
return createSubscription(subscribeSubject, null, null, null, null, null, false); | |||
return createSubscription(subscribeSubject, null, null, null, null, null, false, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these method calls reflect internal signature change. These are brand new subscriptions so don't have a pmm
consumerCreate290Available = impl.consumerCreate290Available; | ||
multipleSubjectFilter210Available = impl.multipleSubjectFilter210Available; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementation to make a JetStream context from a JetStreamManagement context
protected final int thresholdMessages; | ||
protected final long thresholdBytes; | ||
protected final SimplifiedSubscriptionMaker subscriptionMaker; | ||
protected final Dispatcher userDispatcher; | ||
protected final MessageHandler userMessageHandler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making the PullRequestOptions on the fly, so need to keep the original user options, etc. around.
try { | ||
jsm.deleteConsumer(stream, actualConsumerName); | ||
} | ||
catch (Exception ignore) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So on failure of (push) ordered, I want to re-use the same consumer name, but it's likely that the original consumer I was using is still around, I don't know why the ordered error was triggered. But because you cannot just update an existing consumer to change deliver policy/start sequence, I have to make an entire new consumer. So I just make sure I delete the old consumer. The times its does exception will most likely be that it actually was deleted and deleteConsumer threw a JSApi exception
throw ise; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was based on the Push version where I had to maintain the same subscription handle that was given to the user, but it's done differently here and this is just more complicated since the subscription handle is kept from the user because they are using the simplification handle.
@@ -15,4 +15,5 @@ | |||
|
|||
interface PullManagerObserver { | |||
void pendingUpdated(); | |||
void heartbeatError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs this because the manager can't be responsible for restarts because the PMM is only managing messages, not the subscriptions / state getting messages.
… for ease of recognition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The PR addresses consumer recovery when the client is disconnected from a server or when the stream or consumer leader is affected by an outage. At least the following situations are covered