Skip to content

Conversation

YuryHrytsuk
Copy link
Collaborator

@YuryHrytsuk YuryHrytsuk commented Aug 14, 2025

What do these changes do?

Add standlone rabbitmq cluster stack.

Next step:

image

FYI: @pcrespov @GitHK

Related issue/s

Related PR/s

Devops Actions ⚠️

  • create new docker swarm overlay network for rabbit

Prerequisites

Checklist

  • I tested and it works

New stack

  • The Stack has been included in CI Workflow

New service

  • Service has resource limits and reservations
  • Service has placement constraints or is global
  • Service is restartable --> it is a cluster of nodes (each node is a separate docker service). Nodes can be restarted (1 at a time or more, if cluster raft allows)
  • Service restart is zero-downtime --> a cluster (3+ nodes) can revive a single node restart (node can be put under maintenance to be on safe side)
  • Service has >1 replicas in PROD
  • Service has docker healthcheck enabled
  • Service is monitored (via prometheus and grafana) --> to be done in next PR when we switch from rabbit (in simcore stack) to cluster rabbit introduced here
  • Service is not bound to one specific node (e.g. via files or volumes) --> it is bound because of volumes. no way around in our docker swarm setup
  • Relevant OPS E2E Test are added (e2e test rabbit state)
  • Grafana dashboards updated accordingly --> we already have a dashboard that (should) support cluster. To be tested in another PR when we switch from simcore rabbit to clustered rabbit introduced here

If exposed via traefik

  • Service's Public URL is included in maintenance mode --> unrelated
  • Service's Public URL is included in testing mode --> unrelated
  • Service's has Traefik (Service Loadbalancer) Healthcheck enabled --> haproxy healthcheck is monitoring rabbit nodes
  • Credentials page is updated --> to be updated in another PR when we switch traffic to this rabbit cluster
  • Url added to e2e test services (e2e test checking that URL can be accessed) --> to be done when we swtich traffic

@YuryHrytsuk YuryHrytsuk self-assigned this Aug 14, 2025
@YuryHrytsuk
Copy link
Collaborator Author

YuryHrytsuk commented Aug 14, 2025

TODO

  • document how to put node under maintenance --> readme
  • support single node cluster (for local or tiny deployments) --> done via jinja and iterating over node count
  • document how to update erlang cookie (auth secret to access rabbit nodes with CLI client)
  • document autoscaling (joining nodes dynamically on demand) --> not supported at the moment
  • how to properly add / remove nodes? --> readme
  • test rabbit node count >= 3 --> test repo config values unit test
  • how to apply new settings in rabbitmq.conf on a running cluster --> not supported (more in readme)
    • avoid causing restart of containers because of config sha change --> drop sha part so that docker fails to update service on config change
  • add e2e test monitoring health of the cluster --> ops e2e test added
  • run haproxy highly available --> 2+ replicas running
  • make down (reasonable behaviour) --> simply remove the stack but not volumes. Add extra target to clean volumes
  • applying changes via CI Pipelines --> deploy rabibt job is added
    • start cluster fresh new with empty volumes add later if there is a need. otherwise rely on manual operations if it comes to it (manually using makefile targets)
  • restarting (rabbitmq node) service --> document behaviour --> we have stacks. Nothing specific to document now

Clients should properly use HA rabbit

  • configure default replication factor for quorum queues? --> via rabbitmq.conf
  • how to connect to a multi-node cluster --> it is hidden by haproxy (loadbalancer) --> no changes
  • for backenders: make sure clients retry connection on failure

Cluster Formation

Source https://www.rabbitmq.com/docs/clustering


Ways of Forming a Cluster

  • Declaratively by listing cluster nodes in config file <--- we use
  • Declaratively using DNS-based discovery
  • Declaratively using AWS (EC2) instance discovery
  • Declaratively using Kubernetes discovery
  • Declaratively using Consul-based discovery
  • Declaratively using etcd-based discovery

Node Names (Identifiers)

  • must be unique --> achieved via docker service name and env variable

Cluster Formation Requirements

  • every cluster member must be able to resolve hostnames of every other cluster member, its own hostname, as well as machines on which command line tools such as rabbitmqctl might be used --> docker swarm networking

Ports That Must Be Opened for Clustering and Replication --> all works by default in docker swarm (all ports allowed)

  • 4369: epmd, a helper discovery daemon used by RabbitMQ nodes and CLI tools
  • 6000 through 6500: used by RabbitMQ Stream replication
  • 25672: used for inter-node and CLI tools communication and is allocated from a dynamic range (limited to a single port by default, computed as AMQP port + 20000)
  • 35672-35682: used by CLI tools for communication with nodes and is allocated from a dynamic

Nodes in a Cluster

  • Nodes are Equal Peers

For two nodes to be able to communicate they must have the same shared secret called the Erlang cookie.

  • Erlang cookie generation should be done at cluster deployment stage ⚠️ --> achieved via common secret

Node Counts and Quorum:

  • Two node clusters are highly recommended against --> added a test to forbit 2 cluster node configuraiton

Clustering and Clients

Messaging Protocols

  • In case of a node failure, clients should be able to reconnect to a different node, recover their topology and continue operation --> Task for backenders
  • Most client libraries accept a list of endpoints --> we use loadbalancer and 1 endpoint

Stream Clients

  • RabbitMQ Stream protocol clients behave differently from messaging protocols clients --> unrelated for us

Queue and Stream Leader Replica Placement

Cleaning volumes

  • Avoid tasks taking unlimited space --> do no retry jobs + always remove stack before starting new tasks
  • Avoid unexpected volume removal
    • Deleting volumes failed but tasks keep running --> do not retry jobs + use timeouts
    • Deleting volumes unrelated to rabbit (safeguards) --> added

HA Proxy highly available

  • running 2+ replicas and statistics --> we do not expose / use statistics at the beginning

@YuryHrytsuk YuryHrytsuk changed the title Add ha rabbit Add ha rabbit (but not use it) Aug 28, 2025
@YuryHrytsuk YuryHrytsuk changed the title Add ha rabbit (but not use it) Add (ha) rabbit cluster Aug 28, 2025
@YuryHrytsuk YuryHrytsuk changed the title Add (ha) rabbit cluster Add (ha) rabbit cluster (but not use it) Sep 3, 2025
@YuryHrytsuk YuryHrytsuk marked this pull request as ready for review September 16, 2025 07:06
Copy link
Contributor

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job 👍 thanks


validate-NODE_COUNT: guard-NODE_COUNT
@if ! echo "$(NODE_COUNT)" | grep --quiet --extended-regexp '^[1-9]$$'; then \
echo NODE_COUNT must be a positive single digit integer; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: NODE_COUNT must be a positive single digit integer > 0

fi

validate-node-ix0%: .env
@if ! echo "$*" | grep --quiet --extended-regexp '^[0-9]+$$'; then \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: since you will validate that the integer is >= 1 in a later row, you can also already check that in the regex as such: ^[1-9]+$$

start-cluster: start-all-nodes start-loadbalancer

update-cluster stop-cluster:
@$(error This operation may break cluster. Check README for details.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this dummy target with an error

envsubst < $< > $@; \
echo NODE_INDEX=$* >> $@

.PRECIOUS: docker-compose.node0%.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRECIOUS is a new thing to me, reading from https://www.gnu.org/software/make/manual/html_node/Special-Targets.html I think these could actually be "regular" .PHONY targets, or not? 🤔

echo NODE_INDEX=$* >> $@

.PRECIOUS: docker-compose.node0%.yml
docker-compose.node0%.yml: docker-compose.node0x.yml.j2 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool stuff with the %, a bit hard to read if one doesnt know makefiles but we are makefile experts :D

start_interval: 10s

volumes:
rabbit0{{ NODE_INDEX }}_data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool stuff with the looping/templating for multiple nodes.

We used to have a kind-of similar thing for the on-premise minio (was runnning on dalco-prod to provide on-prem S3), you can compare and crosscheck if you want. Maybe there are somethings to find, dont remeber actually https://github.com/ITISFoundation/osparc-ops-environments/blob/8f22a93acf33ec70b55d889e7dae26a4756accdb/services/minio/docker-compose.yaml.j2

deploy:
placement:
constraints:
- node.labels.rabbit0{{ NODE_INDEX }} == true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require a docker labels change in osparc-ops-deployment-configuration and associated PRs I guess :)

gid: "999"
volumes:
- rabbit0{{ NODE_INDEX }}_data:/var/lib/rabbitmq
# TODO: sync with existing rabbit attached networks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: not sure what this todo actually means, dont fully get it

@@ -0,0 +1,19 @@
{% set NODE_IXS = range(1, (RABBIT_CLUSTER_NODE_COUNT | int) + 1) -%}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can we sync this with how rabbit is configured in osparc-simcore, so that the backend dev's setup mimicks the prod one closely?

# haproxy by default resolves server hostname only once
# this breaks if container restarts. By using resolvers
# we tell haproxy to re-resolve the hostname (so container
# restarts are handled properly)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, good find

balance roundrobin

option forwardfor
http-request set-header X-Forwarded-Port %[dst_port]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curioisity: is there a reason this must be set that you remember? due to ha proxy?

{% for ix in NODE_IXS %}
server rabbit0{{ ix }} rabbit-node0{{ ix }}_rabbit0{{ ix }}:{{ RABBIT_MANAGEMENT_PORT }} check resolvers dockerdns init-addr libc,none inter 5s rise 2 fall 3
{%- endfor %}
# keep new line in the end to avoid "Missing LF on last line" error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol


Source: https://www.rabbitmq.com/docs/next/configure#config-changes-effects

## Enable node Maintenance mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good readme, can you write one sentence or link to docs that explain what maintenance mode does?

cpus: "0.1"
memory: "128M"
healthcheck: # https://stackoverflow.com/a/76513320/12124525
test: bash -c 'echo "" > /dev/tcp/127.0.0.1/32087 || exit 1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for the huge effort, this is (by design) working around many limitations of docker swarm, but nevertheless I see that you accounted for many pitfalls and issues. It looks promissing and robust. Let me know if you need help during the rollout, and I am curious to see if issues pop up or if this "just works" :--)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants