-
Notifications
You must be signed in to change notification settings - Fork 6
CPU Limit Adjustments #1181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CPU Limit Adjustments #1181
Conversation
…oundation#979) * Introduce longhorn chart * Further longhorn configuration * Longhorn: further settings configuration * Fix longhorn configuration bugs Extra: introduce longhorn pv vales for portainer * Add comment for deletion longhorn * Further longhorn configuration * Add README.md for Longhorn wit FAQ * Update Longhorn readme * Update readme * Futher LH configuration * Update LH's Readme * Update Longhorn Readme * Improve LH's Readme * LH: Reduce reserved default disk space to 5% Since we use a dedicated disk for LH, we can go ahead with 5% * Use values to set Longhorn storage class * Update LH's Readme * LH Readme: add requirements reference * PR Review: bring back portainer s3 pv * LH: decrease portinaer volume size
resources: | ||
limits: | ||
memory: 1G | ||
cpus: '2' | ||
cpus: '6' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this number? What is the reasoning behind?
Do we take into account available number of CPU on machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the number of available CPUs are not taken into account, and afaik this is correct, they do not matter (as much). [talk to me for clarification ;)]
The number is a guess based on prometheus observation on osparc-master (e.g. PromQL sum(clamp_max(rate(container_cpu_cfs_throttled_seconds_total{image=~"(registry:.*)|(traefik:.*)|(.*itisfoundation.*)|(.*director:.*)"}[2m]) > 0.2,3)) by (container_label_com_docker_swarm_service_name)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevertheless @mrnicegyu11 if you set a limit above what is available in reality, the container never starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then I vote for removing the limit alltogether. I can see that it was throttled >7 (additional) CPUs at a time on master, so 8 CPUs would be the right number. But this is too high to put into the section if what @sanderegg says is correct. @YuryHrytsuk @sanderegg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Test CPU Limit
Optional: Benchmark with and without registry CPU limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks.
for the registry that will not go on worker nodes where we run services in prod right?
Is there an issue where we track all these efforts so we can have a summary? |
@sanderegg good point, they would right now. Can you provide me with a label that autoscaled machines have (by which they can be identified?) so I can exclude at least those? thx |
@mrnicegyu11 These labels are defined in osparc-config. so I guess at least it should not go anywhere where there are dynamic sidecars at least. |
|
I have added constraints along those lines |
- node.labels.gpu!=true | ||
- node.labels.dynamicsidecar!=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing OPS constraint node.labels.ops==true
and introducing negative constraints? Is there anything special about this service so it cannot follow general conventions (ops label in this case)?
@@ -115,11 +115,12 @@ services: | |||
parallelism: 1 | |||
placement: | |||
constraints: | |||
- node.labels.ops==true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chore:
- use
.master.
docker compose file or j2 instead of negative labels - spread-constraint
- or: consider manager machine
on all deployment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: test / precommit (minor)
…on#1182) * wip * Add csi-s3 and have portainer use it * Change request @Hrytsuk 1GB max portainer volume size * Arch Linux Certificates Customization * Fix pgsql exporter failure * [Kubernetes] Introduce on-prem persistent Storage (Longhorn) 🎉 (ITISFoundation#979) * Introduce longhorn chart * Further longhorn configuration * Longhorn: further settings configuration * Fix longhorn configuration bugs Extra: introduce longhorn pv vales for portainer * Add comment for deletion longhorn * Further longhorn configuration * Add README.md for Longhorn wit FAQ * Update Longhorn readme * Update readme * Futher LH configuration * Update LH's Readme * Update Longhorn Readme * Improve LH's Readme * LH: Reduce reserved default disk space to 5% Since we use a dedicated disk for LH, we can go ahead with 5% * Use values to set Longhorn storage class * Update LH's Readme * LH Readme: add requirements reference * PR Review: bring back portainer s3 pv * LH: decrease portinaer volume size * Experimental: Try to add tracing to simcore-traefik on master * Fixes ITISFoundation/osparc-simcore#7363 * Arch Linux Certificates Customization - 2 * Upgrade registry, add tracing * revert accidental commit --------- Co-authored-by: Dustin Kaiser <[email protected]> Co-authored-by: YH <[email protected]>
What do these changes do?
director-v0
registry
Related issue/s
Related PR/s
Checklist