Skip to content

Conversation

nader-ziada
Copy link
Member

@nader-ziada nader-ziada commented Aug 28, 2025

Fixes #14029

Proposed Changes

  • SecurePodDefaults is now enabled by default, but that still doesn't require nonRoot user, Restricted value is introduced to enable that.

Release Note

 - Change SecurePodDefaults default from Disabled to SecureDefaultsOverridable
 - RunAsNonRoot will be set to true by default unless the user sets to false
 - Enabled will set RunAsNonRoot=true 
 - Setting the SecurePodDefaults to Disabled keeps the behaviour as is. 
 - Sets  kubernetes.podspec-securitycontext to "enabled"

Copy link

knative-prow bot commented Aug 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nader-ziada
Once this PR has been reviewed and has the lgtm label, please assign dprotaso for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 28, 2025
@knative-prow knative-prow bot requested review from dprotaso and skonto August 28, 2025 18:49
Copy link

codecov bot commented Aug 28, 2025

Codecov Report

❌ Patch coverage is 55.10204% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.17%. Comparing base (e85e3f6) to head (0601251).

Files with missing lines Patch % Lines
pkg/testing/v1/revision.go 0.00% 20 Missing ⚠️
pkg/apis/serving/v1/revision_defaults.go 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16042      +/-   ##
==========================================
+ Coverage   80.14%   80.17%   +0.03%     
==========================================
  Files         214      214              
  Lines       16877    16915      +38     
==========================================
+ Hits        13526    13562      +36     
- Misses       2989     2995       +6     
+ Partials      362      358       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nader-ziada
Copy link
Member Author

/test upgrade-tests

out := new(corev1.PodSecurityContext)

if config.FromContextOrDefaults(ctx).Features.SecurePodDefaults == config.Enabled {
if config.FromContextOrDefaults(ctx).Features.SecurePodDefaults == config.Enabled || config.FromContextOrDefaults(ctx).Features.SecurePodDefaults == config.Restricted {
Copy link
Contributor

@dsimansk dsimansk Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabled - enable most options but it still works with non-root images - still warn about nonRoot

Is this reflected in the actual SCC values?
Specifically, there's no diction between Enabled and Restricted when I look at L685 that adds out.RunAsNonRoot = in.RunAsNonRoot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@dprotaso dprotaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the release notes in the PR body.

One expected side-effect of this change is that certain images will break with the new default.

eg. nginx will no longer run since it tries to do a chown syscall on startup

nginx: [emerg] chown("/var/cache/nginx/client_temp", 101) failed (1: Operation not permitted)

caddy (which runs as root) runs on on the default profile now. It fails on the restricted one which is expected.

As a follow up (separate PR) I notice the caddy image error doesn't propagate up properly to the Revision properly. The Pod has

      state:
        waiting:
          message: 'container has runAsNonRoot and image will run as root (pod: "hello-00001-deployment-5c77b7fb5b-w7r74_default(271b9aaa-340d-4073-914c-2144c8560273)",
            container: user-container)'
          reason: CreateContainerConfigError

@dprotaso
Copy link
Member

/assign @evankanderson for a security WG review

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also plan to flag changing any defaults here as a breaking change if we decide to do so, along with clear instructions about what the breakage looks like, and how to switch back to the current behavior.

In particular, I think nginx is probably a great example to show / test with, since many people may use it to serve static content.

From a security point of view, I think getting towards safer defaults without breaking too many things for users is a net win, so I think it makes sense from a security perspective.

Comment on lines 47 to 48
# Enabled - enable most options but it still works with non-root images - still warn about nonRoot
# Restricted - enable all options that the restricted profile requires
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this changing the meaning of enabled while adding restricted as an in-between layer?

Comment on lines 249 to 265

if psc.RunAsNonRoot == nil {
if cfg.Features.SecurePodDefaults == config.Restricted {
updatedSC.RunAsNonRoot = ptr.Bool(true)
} else {
updatedSC.RunAsNonRoot = ptr.Bool(false)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's doing something different, since RunAsNonRoot is a tri-state (*bool).

Before, if SecurePodDefaults was enabled, this would set a missing RunAsNonRoot to true . If RunAsNonRoot was set to false, it would leave it as false. The new flag will unconditionally set RunAsNonRoot, which feels like it might be a value which is always wrong.

What about adding an root-okay value between disabled and enabled, and then aiming to change the default to root-okay in an upcoming release (it could be 1.20 or 1.21)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make RootAllowed as the default and that would work the same as you described where it allows RunAsRoot to true if empty or false if the user set it to that.
and keep Restricted but not set it as a default until a later release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could also be called RootEnabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think RootAllowed is probably a good name for this intermediate setting, and I think it's reasonable to make it the new default in the same release; upgrade instructions should flag that setting the config to disabled prior to upgrading will preserve existing behavior.

Since this is an admission webhook, I think it will only affect new Revisions, correct?

Copy link
Member Author

@nader-ziada nader-ziada Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else in the config is called enabled, should we use that for the meaning of root-allowed and mention it in the description? or do you actually want it to be called root-allowed and remove enabled

Since this is an admission webhook, I think it will only affect new Revisions, correct?

yes I tested with an existing cluster and the existing revisions were fine, only applied on create

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like the enum values to clearly indicate where on the security continuum the value falls. We currently have:

  • disabled -- least secure
  • enabled -- some security, low chance of incompatibility
  • restricted -- highest security, medium chance of incompatibility

The problem is that on a quick read, it's not clear to me if enabled > restricted or restricted > enabled in terms of security. I'd sort of expect that enabled means as much security as possible, so I'd like things to look like:

  • disabled -- least secure
  • $BETTER -- some security, low chance of incompatibility
  • enabled -- highest security, medium chance of incompatibility

We could call $BETTER something like transition or root-allowed, but it seems odd for enabled to be an intermediate step, and for users who've already turned on enabled in 1.19 (where it means "highest security") to have a security backslide until they change to restricted in 1.20 (which would not be allowed in 1.19). On the other hand, if we add $BETTER in the middle as a default, users already on enabled don't need to change anything, users who want disabled can set it explicitly before upgrade, and users who don't care can simply accept the upgrade (and possibly set the value afterwards).

Sorry for the late reply -- hope this makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, makes perfect sense, thank you for the clarification. Will make the necessary updates and let you know

…rridable

- applies secure defaults but honors RunAsNonRoot
- sets RunAsNonRoot to true if not already specified
- chanegs 'enabled' to always set RunAsNonRoot to true
- sets  kubernetes.podspec-securitycontext to "enabled"
@nader-ziada
Copy link
Member Author

@evankanderson @dprotaso

here is a summary of the changes:

  • create a new default value for secure-pod-defaults: SecureDefaultsOverridable
  • applies secure defaults but honors RunAsNonRoot
  • sets RunAsNonRoot to true if not already specified
  • when secure-pod-defaults = 'enabled', set RunAsNonRoot to true
  • sets kubernetes.podspec-securitycontext to "enabled"

@evankanderson
Copy link
Member

@evankanderson @dprotaso

here is a summary of the changes:

  • create a new default value for secure-pod-defaults: SecureDefaultsOverridable
  • applies secure defaults but honors RunAsNonRoot
  • sets RunAsNonRoot to true if not already specified
  • when secure-pod-defaults = 'enabled', set RunAsNonRoot to true
  • sets kubernetes.podspec-securitycontext to "enabled"

Sorry to be going around on this, but what's the user benefit to setting secure-pod-defaults = enabled instead of using the pod-security.kubernetes.io/enforce: restricted label on a namespace? The latter enforces the settings filled in by secure-pod-defaults, but also applies to non-Knative pods in the same namespace.

The previous purpose of secure-pod-defaults was to prevent users from accidentally wandering into the "less secure" space that Kubernetes uses as a historical default, without requiring every Knative Service manifest to specify a bunch of securityContexts.

runAsNonRoot is a weird, special flag on the PodSpec, because it acts as a "spoiler" on kubelet running the pod after admission and scheduling. A pod may be accepted with RunAsNonRoot=true and runAsUser=nil, and then fail to run if the container image has USER 0, USER root, or some other such content. runAsNonRoot=true' (but not runAsUser`) is required by the Pod Security Standards "Restricted" profile. This is further complicated by history:

  • Docker defaults to building containers with USER 0 if you don't issue a USER directive.
  • Docker supports both USER <uid> and USER <name> formats; the latter requires pulling the image layer contents for the highest layer which contains /etc/passwd.
    • Layers don't declare which files they contain...
  • A number of popular containers (in particular, nginx) run as root and may be used to (for example) serve static content. Knative would naturally be a place for that.

The "correct" solution is to rebuild the container and/or set runAsUser to a non-zero value, possibly also adding the NET_BIND_SERVICE capability. This will fix compatibily for some containers without getting into the "my pod launched, but then doesn't run" failure state that I suspect will lead to user frustration with the feature.

My suggestion would be to change secure-defaults-overrideable to no-strict-nonroot, and then change lines 250-264 of revision_defaults.go to:

	// RunAsNonRoot breaks more containers than other settings, see discussion in #16042
	if cfg.Features.SecurePodDefaults == config.Enabled {
		if psc.RunAsNonRoot == nil {
			updatedSC.RunAsNonRoot = ptr.Bool(true)
		}
	}

In either case, it feels like the enforcement of RunAsNonRoot would be better handled using either Pod Security Admission on the application namespaces or ValidatingAdmissionPolicy.

Of course, it's also possible that I'm missing a use-case here, but this was mostly intended to be a "better defaults than Kubernetes" feature in the same way that we do for probes and network setup.

@nader-ziada
Copy link
Member Author

My suggestion would be to change secure-defaults-overrideable to no-strict-nonroot, and then change lines 250-264 of revision_defaults.go to:

	// RunAsNonRoot breaks more containers than other settings, see discussion in #16042
	if cfg.Features.SecurePodDefaults == config.Enabled {
		if psc.RunAsNonRoot == nil {
			updatedSC.RunAsNonRoot = ptr.Bool(true)
		}
	}

In either case, it feels like the enforcement of RunAsNonRoot would be better handled using either Pod Security Admission on the application namespaces or ValidatingAdmissionPolicy.

Of course, it's also possible that I'm missing a use-case here, but this was mostly intended to be a "better defaults than Kubernetes" feature in the same way that we do for probes and network setup.

apologies for maybe not understand, but if securePodDefaults is set enabled, we still don't need to set the RunAsNonRoot to true? I though the point of this issue is to eventually make RunAsNonRoot as true the default behaviour. The code snippt above only sets it to true if its nil, but doesn't that allows the user to set it to false, which is the same as secure-defaults-overrideable or no-strict-nonroot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set secure-pod-defaults to "enabled" by default
4 participants