Skip to content

Conversation

jandeschuttere
Copy link
Contributor

Hi 👋

Just a minor change in the interaction with the router, I kept it simple.

Recently we had a bad configuration on our deployment. The proxy was able to prevent any of the bad configuration to be used or to influence any of the running services, yet we noticed the state was updated to reflect the bad configuration.

If there would have been a restart of the proxy for any particular reason this bad configuration might have caused the system to not boot up properly.

@jandeschuttere
Copy link
Contributor Author

@kevinmcconnell just as a friendly reminder, want to be able to close this on my end as well.

@kevinmcconnell
Copy link
Collaborator

Thanks for the ping, @jandeschuttere! I finally got a chance to look into this...

If there are ways to get bad configuration into the loaded service, then I think we need to prevent that happening in the first place, rather than just avoid saving the state at that point. If the configuration change is still loaded, then a subsequent save could cause it to be written out to the state later (for example, a second service being deployed).

This probably means changing how UpdateOptions works so that attempts to initialise before mutating its fields. That way if part of that inialisation fails, it can return before it makes any changes.

Looking at it now, this situation seems like it would apply to cert paths and custom error page paths. Was the bad configuration change you ran into one of those?

@jandeschuttere
Copy link
Contributor Author

jandeschuttere commented Sep 15, 2025

Correct, we accidentally had loaded expired certs on the deployment.

I have not revisited the code right now but the check to identify whether or not the certs are valid was further down the chain. The reason I went for this approach, and not the validation - iirc, was because this might meant a bigger rewrite. I can have a look at it again if you think this would be the path forward?

Do you know if there was a particular reason this was done so late in the "game", when the state already was mutated? Are there code paths that could lead this check to be used on other events (like f.i. a restart of sorts)?

What's the point of view on whether or not a reboot should prevent the service to be reachable in case of an expired certificate? Is it allowed and should there be proper error logging/notifications, or should it be prevented by all means?

@kevinmcconnell
Copy link
Collaborator

The reason I went for this approach, and not the validation - iirc, was because this might meant a bigger rewrite. I can have a look at it again if you think this would be the path forward?

Great, yes I think that would be good. What I had in mind, in the short term, is just reordering what UpdateOptions does, so that if there's an error initializing it will return without mutating the service. So something along the lines of:

func (s *Service) UpdateOptions(options ServiceOptions, targetOptions TargetOptions) error {
	err := s.initialize()
	if err != nil {
		return err
	}

	s.options = options
	s.targetOptions = targetOptions
	return nil
}

(I haven't tested this out, but it seems like something like that should correct the current problem.)

Do you know if there was a particular reason this was done so late in the "game", when the state already was mutated? Are there code paths that could lead this check to be used on other events (like f.i. a restart of sorts)?

I don't think it was intentional. Originally the only part that could fail a deployment was the healthcheck stage, and so the rest of the operation proceeded after that state succeeds.

What's the point of view on whether or not a reboot should prevent the service to be reachable in case of an expired certificate? Is it allowed and should there be proper error logging/notifications, or should it be prevented by all means?

I think in the case of manual certs, we should boot the service as long as we can load the certificate. If it's expired, that's something that should be corrected by the operator, but the proxy shouldn't refuse to boot the service on restart just because of it. However trying to deploy a new service version and referencing an expired custom certificate, we should probably reject that deployment. It seems like a difference between validating the arguments on deploy, and using whatever state has been previously accepted.

What do you think -- does that sound reasonable to you?

@jandeschuttere jandeschuttere force-pushed the prevent-saving-invalid-state-on-deploy branch from 8eee415 to 0fc85a0 Compare September 19, 2025 13:01
return nil
}

func (s *Service) validateOptions(options ServiceOptions) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit sad on having to do this, but I assume the way it was set up was because of a certain desire on how the code is structured. An alternative I'd preferred was passing the options until they have been verified as working, and at that point set up the full state, which feels closer to what I assume the initialize was desired to be doing.

The way that was constructed could lead to the witnessed invalid state, hence the validation which duplicates that logic.

If you feel like there's merit in changing the logic of initialize to do just that, let me know, I am willing to go the extra mile on making those changes. I was hesitant on doing so because I think (assume! - and respect -) this is more of a Ruby type of styling and I don't want to break that convention.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I don't think we need to construct these objects twice.

What I had in mind was what you're describing: perform the initialize step before we change any of the fields on the Service. Only if initialize returns without error would we then update the state of the Service.

So it does mean passing the options into initialize, yes, and also calling it earlier in UpdateOptions. And if it's clearer, have initialize return everything it creates, so that updating the state is all in one place. So perhaps it'd look more like:

certManager, middleware, err := s.initialize(options)
if err != nil {
  return err
}

s.certManager = certManager
s.middleware = middleware
s.options = options
s.targetOptions = targetOptions

It's totally fine to change initialize as necessary to do this -- it's not really trying to follow a particular convention.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great to hear and thank you for bringing clarity.

I've chosen the route of passing the options to the initialize to keep the dependency clear, yet allowing that call to still initialize the state of the instance within the instance itself. I did like the simple return of only the error which was already there and I did not yet see a need myself of passing down that knowledge/responsibility.

what do you think of my last pushed update?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great now! Thanks for taking another pass at it, @jandeschuttere!

I'm thinking we can update the TargetOptions after the initialize as well, but I'll take a look at that bit separately. In any case, this fixes the issue that you had observed.

@jandeschuttere jandeschuttere force-pushed the prevent-saving-invalid-state-on-deploy branch from 0fc85a0 to 28deaa4 Compare September 19, 2025 14:51
@kevinmcconnell kevinmcconnell merged commit b62efc1 into basecamp:main Sep 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants