Skip to content

Reconcile QQ node dead during delete and redeclare #14241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LoisSotoLopez
Copy link
Contributor

@LoisSotoLopez LoisSotoLopez commented Jul 16, 2025

Proposed Changes

This PR implements the suggested solution for the issue described in discussion #13131

Currently, when a QQ is deleted and re-declared while one of its nodes is dead, this dead node won't be able to reconcile with the new queue.

In this PR we add the list of ra UIds for the cluster to each node queue record, so that when a Rabbit node recovers a queue it will be able to detect the situation described above and properly reconcile .

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass locally with my changes
  • If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
  • If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

Co-authored-by: Péter Gömöri <gomoripeti@users.noreply.github.com>
@kjnilsson
Copy link
Contributor

thanks @LoisSotoLopez - this looks like it will do what we discussed a while back. I feel unsure about adding another key to the queue type state for this, mainly becuause we'd have to keep uids and nodes in sync. It would be nicer if nodes turned from a list into a map although even this is a bit controversial and could become a source of bugs. Let me consider it for a day or two.

Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having two keys with similar information (nodes) that will need to be kept in sync. I think we need to move the current nodes list value to the the #{node() => uid()} map format and handle the two formats in the relevant places, mostly in rabbit_queue_type. we do need a list of the member nodes in a few places but we could add a convenience functio: rabbit_queue_type:nodes/1 that takes a queue record and returns a list of member nodes. internally it could just call rabbit_queue_type:info(Q, [members]) and extract the result from that then update all places where we explicity use get_type_state to extract the member nodes.

In addition I think we need to put the use of nodes as a map behind a feature flag to avoid new queue records with nodes map values being created in a mixed versions cluster.

We are moving the functionality of getting the nodes/members of an
amqqueue from the `amqqueue` module to `rabbit_amqqueue`. This goes in
the line of previous PRs work towards reducing direct access to the
`QueueTypeState`, such as
rabbitmq#13905. Also, we will
need to discretize different formats of the `nodes` entry in the
`QueueTypeState`, to support both the previous one as a list of nodes
and the new one as a map of nodes to Ra UIds. Doing so in a module such
as `amqqueue`, which feels like an accessor module around the `amqqueue`
record, doesn't feel right.
@LoisSotoLopez
Copy link
Contributor Author

@kjnilsson Thanks for the suggestions. Just wanted to let you know we are working on this. Had some incident we had to take care of this week but I'll be pushing this PR forward next week.

@michaelklishin
Copy link
Collaborator

@LoisSotoLopez we have to ask your employer to sign the Broadcom CLA before we can accept this contribution (or its future finished version).

It is about one page long, nothing particularly unusual or onerous.

@LoisSotoLopez
Copy link
Contributor Author

LoisSotoLopez commented Aug 1, 2025

That commit below is just for showing current progress. Have been struggling to understand why a few of the remaining tests fail.

@LoisSotoLopez LoisSotoLopez force-pushed the qq_uuid_in_metadata_store branch from 1074165 to 35ef780 Compare August 1, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy