Skip to content

Conversation

tkalir
Copy link
Contributor

@tkalir tkalir commented Apr 14, 2025

Related to #8201

Change Description

Automating the documentation for endpoint permissions requires to decouple the permission objects from the endpoint code. This PR extracts the permission objects to a separate file.
Since the permission code is sensitive, I opted for a straightforward refactor as a first step. This also makes it possible to test the next changes in the permission code separately from the endpoint code.
The functions are named after the logAction operation names when available (e.g copy_objects -> CopyObjectsPermissions).

Testing Details

When refactoring I used the IDE's "Extract Method" feature for consistency and minimize room for error. I also ran the tests locally (excluding Hadoop related tests, which I had an issue with running locally).
Let me know if you think additional coverage is required.

@tkalir tkalir marked this pull request as draft April 14, 2025 13:35
@tkalir tkalir marked this pull request as ready for review April 15, 2025 22:04
@arielshaqed arielshaqed self-requested a review April 16, 2025 15:09
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tkalir ,

THANKS! Overall this change looks fine. I am curious how this will continue. At some point you will need to strings from all of these permissions, and you will need something in memory that holds all usable operations, and the knowledge that it matches the list of all your funcs. This will still be an implementation issue.

As an alternative: it seems that you have a small number of truly different "*Permissions" funcs. Could we instead implement something like

func MakeGroupPermissions(action string) func(groupID string) permissions.Node {
	return func(groupID string) permissions.Node {
		return permissions.Node{
			Permission: Permission{
				Action:   action,
				Resource: GroupArn(groupID),
			},
		}
	}
}

// A "few" other similar Make*Permissions, say MakeObjectPermissions, MakePolicyPermissions, and similar.

Now you might store a Make*Permissions func in a map where the index is the operation name. And then you load in controller.go a permissions func, and then call that func. With a few tweaks we will be able to use that map to print documentation.

WDYT?

@@ -0,0 +1,1210 @@
package permissions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment about the filename. Unfortunately filenames can be really tough. We dislike capitals in filenames for two reasons:

  • The existing Go code base uses underscores not uppercase.

    Right now we have only one such file, which is auto-generated by a tool:

    ❯ find ./pkg/ ./cmd/ -name '*[A-Z]*.go'
    ./pkg/metastore/hive/gen-go/hive_metastore/GoUnusedProtection__.go
  • Uppercase can actively be harmful because of bad defaults on MacOS. By default, MacOS creates a case-insensitive but case-preserving filesystem when you set it up. So this file will also be accessible by names operationpermissions.go, opeRATionPermissions.go, and others. But as soon as someone does that by mistake, every other OS fails to find that file. We have had bad experience here. So we try to avoid uppercase where it matters.

BUT please make this change only after we approve all other issues on the review. Otherwise GitHub tends to lose the comments when you rename a file. 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the explanation. I'll rename the file after the rest is fine.

@tkalir
Copy link
Contributor Author

tkalir commented Apr 20, 2025

@arielshaqed thanks for reviewing!

My initial thought was doing something similar to the draft here.
The tldr version is along the lines of mapping each operation name to a permissions object that has the resource template (e.g "group/{groupID}") in the resource field. The doc generation part will use this object as is, and when used in the endpoint flow it will call a "fill" function that substitutes the placeholders with the actual values.

If I understand your suggestion correctly, the doc-gen code is supposed to call the function that MakeGroupPermissions (etc) returns, which means it needs to know how many arguments this function expects. It's not impossible to solve, but I think the template solution is easier in this case.

Unrelated question - there's an automatic check that fails this PR for not closing an issue. I'd rather split the code to multiple PRs (I also want to extract the log_action strings to enums before making the map object, which I don't think belongs here). In that case, this PR won't close the original issue. Should I create another issue for this PR to close?
Thanks

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Firstly, very sorry about dropping this PR. I completely missed it, and that is wrong. I will change my workflow around emails from GitHub in order to see if I can prevent this in future.

Thanks for the detailed work! I find this PR quite hard to review, there are unfortunately many changed lines. And I am certain it was even harder to write. Mistakes here are, by definition, security bugs - so I will have to automate this part. To that end - I see that you changed some operation names. This make automating the check impossible. Could you please ensure that names are kept? I commented in 2 places.

I am also unsure how this helps us auto-generate documentation. That is probably because I do not understand the next step. Is the plan to process permissions/operationPermissions somehow and extract permissions from there?

Resource: permissions.GroupArn(groupID),
},
}) {
if !c.authorize(w, r, permissions.ListGroupUsersPermissions(groupID)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the same names, please?

Resource: permissions.GroupArn(groupID),
},
}) {
if !c.authorize(w, r, permissions.RemoveUserFromGroupPermissions(groupID)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here and elsewhere in the file, of course.

@tkalir
Copy link
Contributor Author

tkalir commented Jun 24, 2025

Hi @arielshaqed thanks for the review.

  1. honestly, I didn't realize that the function names in the controller should match the operationIds, so as I mentioned, I named the functions after the logAction strings. I corrected that now.

  2. regarding how this PR helps: my next step is to construct the operationId -> permission-node map, where the resource parts of the nodes are templates. So the docgen flow can use these nodes as-is, and the controller flow will fill these templates with the request data.

I intend to replace the code in the operationPermissions functions to use and fill the nodes from the map.
I did not include it in this PR because I thought it would be even harder to review and verify.
Additionally, if the current operationPermissions functions are correct, it will be easy to verify that the new versions (that use the map) return the same results.

Part of my assumption was that this PR would be hard to review and verify automatically, so I tried to make the changes as straightforward as possible (extracting code from each controller function to a helper function). I am open to suggestions how to to make such PRs more reviewable in the future.
I am also curious to see how you plan to automate the verification that the pre-pr and post-pr code do the same, I didn't think of a good way to do so myself.

Thanks!

@arielshaqed
Copy link
Contributor

Hi @tkalir ,
Apologies for another delay - I've been unwell. Better now, so will reply by Sunday.

@arielshaqed
Copy link
Contributor

I will write a script to extract permissions used for each call. I think that may give us confidence we're doing the right thing here.

@arielshaqed
Copy link
Contributor

Here is a more concrete proposal. The chief driving force here is reliability: reduce the risk of human error when extracting the correct permissions by automating it a bit. So I wrote a dumb script that just manages to extract the permissions. Because this is a forklift (we replace one thing with another, but we do not plan on doing the same replacement again), it is allowed for the script to fail and say that it failed in some cases. This is better than making a silent error.

After wasting some time and burning the planet using Gemini Code (pro tip: it doesn't work), I hacked together the attached Python script and ran it on controller.go. It manages to extract permissions for most of the API functions - we can do the remaining 4 manually. Because of reasonably strict error checking in the script, I feel that the automation here is trustworthy.

I think next up we can extract permissions. We will need to extract permissions to a more usable object. For instance, if we had an interface Allower with a method Allowed(permissionParams, r http.Request, w *http.Response) then we could have a map element "create_presign_multipart_upload" -> AllowByPath(...).

An implementation of AllowByPath would be something like this:

type AllowByPath struct {
    Action string
}

func (a *AllowByPath) func Allowed(params PermissionParams, r http.Request, w *http.Response) bool {
    return a.Controller.AuthorizeCallback(w, r, permissions.Node{
		Permission: permissions.Permission{
			Action:   a.Action
			Resource: permissions.ObjectArn(params.Repository, params.Path),
		},
	})
}

Once we've done that, we can generate documentation by adding another method to Allower. And we can order the documentation correctly by ordering all the function names output in permissions.out.

WDYT?

permissions.out.gz
extract_permissions.py.gz

@tkalir
Copy link
Contributor Author

tkalir commented Jul 8, 2025

Thanks, this looks like a solid idea.

My main concern with this approach is how to verify the code's correctness after the change.
In the last PR, I relied on the IDE's refactoring tool to reduce the chance of human error when extracting the nodes' code into functions. With this approach, the IDE won't be as much help. While this approach does result in a lot less new functions, I think the new objects + the changes in the calling code still results in many changes that would be hard to check manually.

One option for mitigating that could be verifying that Authorize() receives the same input when the same endpoints are called with the same parameters. To try this, I called LinkPhysicalAddress() with a fake Controller object whose Authorize() logs the permission object, and it worked.
So what I could do is:

  1. create a fake controller that logs Authorize()'s input to a file
  2. call all endpoint handlers
  3. write the new code
  4. call the endpoint handlers again
  5. check that the outputs are the same

I suspect the test code would have to change a little for each endpoint, so the test code won't be small, but it is doable.
It might be an overkill, but since this touches security code along all handlers, it might be justified. What do you think? Is there a simpler way to test this change?

@arielshaqed
Copy link
Contributor

Since we have reproducible automation, I suggest we:

  • Review the Python script (I wrote it, so either you review it or I find a third person).
  • Sample 10 of its outputs.
  • Go on from there.

(Of course change the script if you need to tweak its output).

@tkalir
Copy link
Contributor Author

tkalir commented Jul 17, 2025

I do believe the script works as intended - but my concern is also with the boilerplate that the new code will require, which the script won't help with verifying.
But we can postpone this discussion to after I write the code.
(Or, if you prefer, I can start with doing only the flows involving AllowByPath to make the review more manageable)

@tkalir
Copy link
Contributor Author

tkalir commented Jul 24, 2025

Hi @arielshaqed ,

Working on the code, I noticed that some endpoints pass a callback to AuthorizeCallback, which will need to be passed to Allowers. It can certainly work, but made me wonder if we can simplify the design.

This suggestion assumes that the only real difference between Allowers is the Node object they use, let me know if you think that might change.
Otherwise, I suggest storing in the map objects like these:

type PermissionDescriptor interface {
	Permission(params PermissionParams) permissions.Node
        <doc generation function>
}

type PathPermissionDescriptor struct {
	Action string
}

func (p *PathPermissionDescriptor) Permission(params PermissionParams) permissions.Node {
	return permissions.Node{
		Permission: permissions.Permission{
			Action:   p.Action,
			Resource: permissions.ObjectArn(params.Repository, params.Path),
		},
	}
}

Then, instead of calling authorize() directly, we can wrap it in a function that receives OperationId and all the relevant params, and have it handle the map and call authorize(). This way there's less chance that changes to the flow will ripple to every endpoint.

What do you think?

@arielshaqed
Copy link
Contributor

Hi @tkalir ,

Really glad we're back on this!

Sorry, I am not sure about this part of your suggestion; I am probably missing your point in all of this.

	return permissions.Node{
  	Permission: permissions.Permission{
  		Action:   p.Action,
  		Resource: permissions.ObjectArn(params.Repository, params.Path),
  	},
  }

There are many more types of Nodes, in actual use! I mean, many Nodes refer to other ARNs than object ARNs, and we sometimes use "AND" Nodes.

I think you refer to more than what I read -- and would be glad if you could help me understand.

@tkalir
Copy link
Contributor Author

tkalir commented Jul 24, 2025

Hi, I'll explain a bit more.

This suggestion is similar to your last one - having all objects in the map share an interface (PermissionDescritor), with the same number of implementations. Meaning, every time the permission node is the same ARN structure with a different action, they share an implementation. Times when AND is used may require a different implementation each.

Comparing to the Allowers design, it's just about splitting a bit differently what's in the interface implementation and what's in the calling code. Instead of having each implementation calling AuthorizeCallback with a different node object, I am having each implementation returning a different node, which the calling code passes to Authorize().

Some more examples:

type BranchPermissionDescriptor struct {
	Action string
}

func (b *BranchPermissionDescriptor) Permission(params PermissionParams) permissions.Node {
	return permissions.Node{
		Permission: permissions.Permission{
			Action:   b.Action,
			Resource: permissions.BranchArn(params.Repository, params.Branch),
		},
	}
}

example using AND:

type CreateRepoPermissionDescriptor struct {}

func (c *CreateRepoPermissionDescriptor) Permission(params PermissionParams) permissions.Node {
	return permissions.Node{
		Type: permissions.NodeTypeAnd,
		Nodes: []permissions.Node{
			{
				Permission: permissions.Permission{
					Action:   permissions.CreateRepositoryAction,
					Resource: permissions.RepoArn(params.Repository),
				},
			},
			{
				Permission: permissions.Permission{
					Action:   permissions.AttachStorageNamespaceAction,
					Resource: permissions.StorageNamespace(params.StorageNamespace),
				},
			},
		},
	}
}

(PermissionDescriptor seems too long, if we use this design we can go with "OpPermission", "PermissionSpec" or something similar)
Makes sense?

@arielshaqed
Copy link
Contributor

Thanks for the explanation - this really makes a lot of sense! So most of the logic I thought you needed to put in the PermissionDescriptor, you actually put into the factory (constructor) functions Permission, right?

You've obviously put a great deal of understanding into our requirements, even those I did not phrase formally. And I think this proposal offers a great path forward.

I am asking for some clarifications, to be sure that I understand and to try to avoid some blockers. I really think we can manage all of these, either this way or another way.

IIUC this means PermissionParams will have many more fields, for branches, source objects (for copy), repositories, users, groups, tags, and all that. That sounds great, I had not thought of it like that. If I now understand correctly, my only request is that PermissionParams fields will be pointers to strings not strings: I would like code in controller.go to be very clear about the difference between "I don't have a parameter value to give here" (nil) and "I have a parameter value to give here, and it is the empty string" (a pointer to an empty string). I realize it's probably OK because empty strings probably don't make sense for any of the parameters. But... maybe they do? Maybe an empty path component could make sense in future? If we do it with pointers we will never have to worry about it.

Creating a pointer to a string can be unpleasant in Go, particularly when you have a literal or a computed string. We use swag.String(...) which does that for us; you could use that in your upcoming code too.

@tkalir
Copy link
Contributor Author

tkalir commented Aug 10, 2025

Hi, I was sick for a few days so it got delayed.
Thanks for the feedback!

Regarding where the logic is located, to make sure we are on the same page, my suggestion is to replace Authorize() with a function like this

func (c *Controller) authorizeReq(w http.ResponseWriter, r *http.Request, params PermissionParams, operationId string, opts *AuthorizeOpts) bool {
	desc := GetPermissionDescriptor(operationId)
	callback := writeError
	if opts != nil && opts.callback != nil {
		callback = opts.callback
	}
	return c.authorizeCallback(w, r, desc.Permission(params), callback)
}
// currently custom callback is needed only in ~2 endpoints, but having opts reduces 
// the chance of needing to change the function's signature in the future

GetPermissionDescriptor just looks in the map for a PermissionDescriptor object, using the operationId as key.
The map is populated by (for the examples from my last message)

"CreatePresignMultipartUpload" -> &BranchPermissionDescriptor{action: permissions.WriteObjectAction}
"CreateRepo" -> &CreateRepoPermissionDescriptor{} 

So not a lot of logic overall.

Some additional points:

  1. as you said, PermissionParams will use multiple fields for tags, branches, etc. I think that currently there won't be a use for having nil values there, but future-proofing won't harm us, I'll do as you suggested.

  2. which keys to use in the map - I suggest using the operationId names, like createPresignMultipartUpload. When generating the docs we will need to fetch info from the swagger (like the endpoint url for the operation), and the swagger uses operationIds.
    There are a few possibilities:
    a - using hard coded strings as keys in the map
    b - generate operationId constants from the swagger and use them as keys
    c - generate Operation objects from the swagger, which will include the operationIds and endpoint urls (so parsing the swagger in the doc gen part won't be necessary). If we want to create operation->logAction mapping, the logAction names can be added as custom fields in the swagger, for the times where getting logAction from operationId is not trivial.
    In this option we use <operation>.id as key.

A or B will be easier, as this issue doesn't necessitate dealing with the logaction names, but tell me what you think.

@tkalir tkalir requested a review from arielshaqed August 10, 2025 16:34
@arielshaqed
Copy link
Contributor

Glad to hear you're better!

I'm happy with any of the options. Personally I do not see much benefit in consts / string enums / int enums compared to just copying the operation ID. I mean, it basically means writing stuff like

func GetObject(...params) {
    something.Permissions(GetObjectActionName, ...)
    // ...
}

rather than

func GetObject(...params) {
    something.Permissions("get-object", ...)
    // ...
}

The minor advantage to the first is that it catches some errors at compile-time rather than at run-time. But:

  • If you write "geet-object" with an error, then any test of the code will fail. And I really hope we test our changes by running them at least once 🙀
  • A much more common error for me is to copy-paste and end up with "put-object" instead of "get-object". And then I'd be as likely to write PutObjectActionName.

So no strong feelings here.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(empty comment to make it your turn to review)

@tkalir
Copy link
Contributor Author

tkalir commented Aug 11, 2025

thanks!

I think another benefit of having constant variables is that it makes it easier to find where in the code the string is used (for this flow's purpose). I mean, the operationid might appear in multiple places in the code base, but the constant appears only in this flow.
However, since you don't have a preference and this task is quite big as it is, I think I'll go for the simpler solution.

I think my next step will be doing a PR for migrating the Path-using endpoints to this method, and then we can talk testing with concrete examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants