-
Notifications
You must be signed in to change notification settings - Fork 4k
Reconcile QQ node dead during delete and redeclare #14241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Péter Gömöri <gomoripeti@users.noreply.github.com>
thanks @LoisSotoLopez - this looks like it will do what we discussed a while back. I feel unsure about adding another key to the queue type state for this, mainly becuause we'd have to keep |
dac1a44
to
72a48e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like having two keys with similar information (nodes) that will need to be kept in sync. I think we need to move the current nodes
list value to the the #{node() => uid()}
map format and handle the two formats in the relevant places, mostly in rabbit_queue_type
. we do need a list of the member nodes in a few places but we could add a convenience functio: rabbit_queue_type:nodes/1
that takes a queue record and returns a list of member nodes. internally it could just call rabbit_queue_type:info(Q, [members])
and extract the result from that then update all places where we explicity use get_type_state to extract the member nodes.
In addition I think we need to put the use of nodes
as a map behind a feature flag to avoid new queue records with nodes
map values being created in a mixed versions cluster.
We are moving the functionality of getting the nodes/members of an amqqueue from the `amqqueue` module to `rabbit_amqqueue`. This goes in the line of previous PRs work towards reducing direct access to the `QueueTypeState`, such as rabbitmq#13905. Also, we will need to discretize different formats of the `nodes` entry in the `QueueTypeState`, to support both the previous one as a list of nodes and the new one as a map of nodes to Ra UIds. Doing so in a module such as `amqqueue`, which feels like an accessor module around the `amqqueue` record, doesn't feel right.
@kjnilsson Thanks for the suggestions. Just wanted to let you know we are working on this. Had some incident we had to take care of this week but I'll be pushing this PR forward next week. |
@LoisSotoLopez we have to ask your employer to sign the Broadcom CLA before we can accept this contribution (or its future finished version). It is about one page long, nothing particularly unusual or onerous. |
That commit below is just for showing current progress. Have been struggling to understand why a few of the remaining tests fail. |
1074165
to
35ef780
Compare
Proposed Changes
This PR implements the suggested solution for the issue described in discussion #13131
Currently, when a QQ is deleted and re-declared while one of its nodes is dead, this dead node won't be able to reconcile with the new queue.
In this PR we add the list of ra UIds for the cluster to each node queue record, so that when a Rabbit node recovers a queue it will be able to detect the situation described above and properly reconcile .
Types of Changes
Checklist
CONTRIBUTING.md
documentFurther Comments