-
Notifications
You must be signed in to change notification settings - Fork 394
Extract permission node creation 8201 #8963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Extract permission node creation 8201 #8963
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tkalir ,
THANKS! Overall this change looks fine. I am curious how this will continue. At some point you will need to strings from all of these permissions, and you will need something in memory that holds all usable operations, and the knowledge that it matches the list of all your funcs. This will still be an implementation issue.
As an alternative: it seems that you have a small number of truly different "*Permissions" funcs. Could we instead implement something like
func MakeGroupPermissions(action string) func(groupID string) permissions.Node {
return func(groupID string) permissions.Node {
return permissions.Node{
Permission: Permission{
Action: action,
Resource: GroupArn(groupID),
},
}
}
}
// A "few" other similar Make*Permissions, say MakeObjectPermissions, MakePolicyPermissions, and similar.
Now you might store a Make*Permissions func in a map where the index is the operation name. And then you load in controller.go a permissions func, and then call that func. With a few tweaks we will be able to use that map to print documentation.
WDYT?
@@ -0,0 +1,1210 @@ | |||
package permissions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment about the filename. Unfortunately filenames can be really tough. We dislike capitals in filenames for two reasons:
-
The existing Go code base uses underscores not uppercase.
Right now we have only one such file, which is auto-generated by a tool:
❯ find ./pkg/ ./cmd/ -name '*[A-Z]*.go' ./pkg/metastore/hive/gen-go/hive_metastore/GoUnusedProtection__.go
-
Uppercase can actively be harmful because of bad defaults on MacOS. By default, MacOS creates a case-insensitive but case-preserving filesystem when you set it up. So this file will also be accessible by names
operationpermissions.go
,opeRATionPermissions.go
, and others. But as soon as someone does that by mistake, every other OS fails to find that file. We have had bad experience here. So we try to avoid uppercase where it matters.
BUT please make this change only after we approve all other issues on the review. Otherwise GitHub tends to lose the comments when you rename a file. 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the explanation. I'll rename the file after the rest is fine.
@arielshaqed thanks for reviewing! My initial thought was doing something similar to the draft here. If I understand your suggestion correctly, the doc-gen code is supposed to call the function that MakeGroupPermissions (etc) returns, which means it needs to know how many arguments this function expects. It's not impossible to solve, but I think the template solution is easier in this case. Unrelated question - there's an automatic check that fails this PR for not closing an issue. I'd rather split the code to multiple PRs (I also want to extract the log_action strings to enums before making the map object, which I don't think belongs here). In that case, this PR won't close the original issue. Should I create another issue for this PR to close? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Firstly, very sorry about dropping this PR. I completely missed it, and that is wrong. I will change my workflow around emails from GitHub in order to see if I can prevent this in future.
Thanks for the detailed work! I find this PR quite hard to review, there are unfortunately many changed lines. And I am certain it was even harder to write. Mistakes here are, by definition, security bugs - so I will have to automate this part. To that end - I see that you changed some operation names. This make automating the check impossible. Could you please ensure that names are kept? I commented in 2 places.
I am also unsure how this helps us auto-generate documentation. That is probably because I do not understand the next step. Is the plan to process permissions/operationPermissions somehow and extract permissions from there?
pkg/api/controller.go
Outdated
Resource: permissions.GroupArn(groupID), | ||
}, | ||
}) { | ||
if !c.authorize(w, r, permissions.ListGroupUsersPermissions(groupID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the same names, please?
pkg/api/controller.go
Outdated
Resource: permissions.GroupArn(groupID), | ||
}, | ||
}) { | ||
if !c.authorize(w, r, permissions.RemoveUserFromGroupPermissions(groupID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here and elsewhere in the file, of course.
Hi @arielshaqed thanks for the review.
I intend to replace the code in the operationPermissions functions to use and fill the nodes from the map. Part of my assumption was that this PR would be hard to review and verify automatically, so I tried to make the changes as straightforward as possible (extracting code from each controller function to a helper function). I am open to suggestions how to to make such PRs more reviewable in the future. Thanks! |
Hi @tkalir , |
I will write a script to extract permissions used for each call. I think that may give us confidence we're doing the right thing here. |
Here is a more concrete proposal. The chief driving force here is reliability: reduce the risk of human error when extracting the correct permissions by automating it a bit. So I wrote a dumb script that just manages to extract the permissions. Because this is a forklift (we replace one thing with another, but we do not plan on doing the same replacement again), it is allowed for the script to fail and say that it failed in some cases. This is better than making a silent error. After wasting some time and burning the planet using Gemini Code (pro tip: it doesn't work), I hacked together the attached Python script and ran it on controller.go. It manages to extract permissions for most of the API functions - we can do the remaining 4 manually. Because of reasonably strict error checking in the script, I feel that the automation here is trustworthy. I think next up we can extract permissions. We will need to extract permissions to a more usable object. For instance, if we had an interface Allower with a method An implementation of AllowByPath would be something like this: type AllowByPath struct {
Action string
}
func (a *AllowByPath) func Allowed(params PermissionParams, r http.Request, w *http.Response) bool {
return a.Controller.AuthorizeCallback(w, r, permissions.Node{
Permission: permissions.Permission{
Action: a.Action
Resource: permissions.ObjectArn(params.Repository, params.Path),
},
})
} Once we've done that, we can generate documentation by adding another method to Allower. And we can order the documentation correctly by ordering all the function names output in permissions.out. WDYT? |
Thanks, this looks like a solid idea. My main concern with this approach is how to verify the code's correctness after the change. One option for mitigating that could be verifying that Authorize() receives the same input when the same endpoints are called with the same parameters. To try this, I called LinkPhysicalAddress() with a fake Controller object whose Authorize() logs the permission object, and it worked.
I suspect the test code would have to change a little for each endpoint, so the test code won't be small, but it is doable. |
Since we have reproducible automation, I suggest we:
(Of course change the script if you need to tweak its output). |
I do believe the script works as intended - but my concern is also with the boilerplate that the new code will require, which the script won't help with verifying. |
Hi @arielshaqed , Working on the code, I noticed that some endpoints pass a callback to AuthorizeCallback, which will need to be passed to Allowers. It can certainly work, but made me wonder if we can simplify the design. This suggestion assumes that the only real difference between Allowers is the Node object they use, let me know if you think that might change.
Then, instead of calling What do you think? |
Hi @tkalir , Really glad we're back on this! Sorry, I am not sure about this part of your suggestion; I am probably missing your point in all of this.
There are many more types of Nodes, in actual use! I mean, many Nodes refer to other ARNs than object ARNs, and we sometimes use "AND" Nodes. I think you refer to more than what I read -- and would be glad if you could help me understand. |
Hi, I'll explain a bit more. This suggestion is similar to your last one - having all objects in the map share an interface ( Comparing to the Allowers design, it's just about splitting a bit differently what's in the interface implementation and what's in the calling code. Instead of having each implementation calling AuthorizeCallback with a different node object, I am having each implementation returning a different node, which the calling code passes to Authorize(). Some more examples:
example using AND:
(PermissionDescriptor seems too long, if we use this design we can go with "OpPermission", "PermissionSpec" or something similar) |
Thanks for the explanation - this really makes a lot of sense! So most of the logic I thought you needed to put in the PermissionDescriptor, you actually put into the factory (constructor) functions Permission, right? You've obviously put a great deal of understanding into our requirements, even those I did not phrase formally. And I think this proposal offers a great path forward. I am asking for some clarifications, to be sure that I understand and to try to avoid some blockers. I really think we can manage all of these, either this way or another way. IIUC this means PermissionParams will have many more fields, for branches, source objects (for copy), repositories, users, groups, tags, and all that. That sounds great, I had not thought of it like that. If I now understand correctly, my only request is that PermissionParams fields will be pointers to strings not strings: I would like code in controller.go to be very clear about the difference between "I don't have a parameter value to give here" (nil) and "I have a parameter value to give here, and it is the empty string" (a pointer to an empty string). I realize it's probably OK because empty strings probably don't make sense for any of the parameters. But... maybe they do? Maybe an empty path component could make sense in future? If we do it with pointers we will never have to worry about it. Creating a pointer to a string can be unpleasant in Go, particularly when you have a literal or a computed string. We use swag.String(...) which does that for us; you could use that in your upcoming code too. |
Hi, I was sick for a few days so it got delayed. Regarding where the logic is located, to make sure we are on the same page, my suggestion is to replace
So not a lot of logic overall. Some additional points:
A or B will be easier, as this issue doesn't necessitate dealing with the logaction names, but tell me what you think. |
Glad to hear you're better! I'm happy with any of the options. Personally I do not see much benefit in consts / string enums / int enums compared to just copying the operation ID. I mean, it basically means writing stuff like func GetObject(...params) {
something.Permissions(GetObjectActionName, ...)
// ...
} rather than func GetObject(...params) {
something.Permissions("get-object", ...)
// ...
} The minor advantage to the first is that it catches some errors at compile-time rather than at run-time. But:
So no strong feelings here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(empty comment to make it your turn to review)
thanks! I think another benefit of having constant variables is that it makes it easier to find where in the code the string is used (for this flow's purpose). I mean, the operationid might appear in multiple places in the code base, but the constant appears only in this flow. I think my next step will be doing a PR for migrating the Path-using endpoints to this method, and then we can talk testing with concrete examples. |
Related to #8201
Change Description
Automating the documentation for endpoint permissions requires to decouple the permission objects from the endpoint code. This PR extracts the permission objects to a separate file.
Since the permission code is sensitive, I opted for a straightforward refactor as a first step. This also makes it possible to test the next changes in the permission code separately from the endpoint code.
The functions are named after the logAction operation names when available (e.g copy_objects -> CopyObjectsPermissions).
Testing Details
When refactoring I used the IDE's "Extract Method" feature for consistency and minimize room for error. I also ran the tests locally (excluding Hadoop related tests, which I had an issue with running locally).
Let me know if you think additional coverage is required.