-
Notifications
You must be signed in to change notification settings - Fork 396
Add S3 guard to prevent publishing existing jars #9492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… mapping in Makefile
…oved accuracy and error handling in Makefile
…n, additional debug logs, and stricter error handling
…ivation logic, fallback handling, and additional debug logs
…d key derivation processes and improving error handling.
… debug logs and simplifying host, bucket, and key derivation processes.
… and enhancing error messaging
…standardize S3 URL formatting in logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we run aws s3 cli to perform the upload, we need to update the requirements from the env / readme.
also, sbt as a build tool should be able to run command, we can use it - without adding code to compile in order to run a command.
…ils for maintainers
val region = "us-east-1" | ||
|
||
val cmd = Seq( | ||
"aws","s3api","put-object", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search for non 's3api' command - aws s3 is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using aws s3 cp
instead of s3api put-object
is possible with some "overhead tweak", but there are trade-offs:
aws s3 cp
can perform multipart uploads, however, it does not support the If-None-Match: "*"
condition. That means you cannot guarantee an atomic “create-if-absent” operation with aws s3 cp
.
If we switch to aws s3 cp
, the only workaround would be something like -
- checking first:
aws s3api head-object --bucket "$bucket" --key "$key" ....
- And then uploading:
aws s3 cp "$jar" "s3://$bucket/$key"....
But this is not atomic.
For our use case, since the assembly jar is always < 5GB, I think s3api put-object
would fit nicely (it supports single-object uploads up to 5GB).
If one day we publish artifacts larger than 5GB, we would need multipart upload (and lose the atomic “no-overwrite” guarantee).
…ad`, update related Makefile and build logic, and remove outdated README publishing details.
…s3Upload` error handling.
…ogic in Spark client build
…lish-scala` task via Makefile.
… bucket and key details.
… logging in Spark client
…redundant error handling.
…ove `publishBucket` handling in Spark client build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please release experimental version from branch after merge for final sanity check?
Verify its accesible from S3 with no auth or aws cli, just curl and that's the jar works!
Works fine ! |
* Update AWS S3 configurations and add debug step in publish workflow * Revert project version to 0.16.0-demo-9 in Spark client build configuration * Add S3 overwrite guard to Scala publish process in Makefile * Remove AWS debug step from publish workflow * Update S3 bucket name in Makefile * Refine S3 overwrite guard logic and update handling of bucket and key mapping in Makefile * Simplify S3 overwrite guard logic in Makefile and streamline error handling * Refactor S3 overwrite guard logic in Makefile to improve readability and robustness * Enhance S3 overwrite guard to use HTTP status check via curl for improved accuracy and error handling in Makefile * Improve S3 overwrite guard in Makefile with enhanced bucket extraction, additional debug logs, and stricter error handling * Remove redundant comment from Makefile's S3 bucket extraction logic * Enhance S3 overwrite guard in Makefile with improved host and key derivation logic, fallback handling, and additional debug logs * Simplify S3 overwrite guard logic in Makefile by streamlining host and key derivation processes and improving error handling. * Streamline S3 overwrite guard logic in Makefile by removing redundant debug logs and simplifying host, bucket, and key derivation processes. * Improve S3 overwrite guard in Makefile by refining HTTP status checks and enhancing error messaging * Bump Spark client version to 0.16.0-demo-10 in build.sbt * Simplify S3 overwrite guard logic in Makefile by refining host and bucket derivation steps. * Enhance S3 overwrite guard in Makefile by refining key derivation logic with name, version, and jar details * Refine S3 overwrite guard in Makefile by enhancing bucket and key extraction logic with stricter validations and improved error handling * Refine S3 overwrite guard in Makefile by simplifying bucket and key derivation logic using mappings and removing redundant validations. * Refine S3 overwrite guard in Makefile by adding sbt flags for no colors and disabling supershell. * Simplify S3 overwrite guard in Makefile by removing redundant sbt flags for no colors and supershell. * Simplify S3 overwrite guard in Makefile by refining host, bucket, and key derivation logic and consolidating HTTP status checks. * Simplify S3 overwrite guard in Makefile by inlining bucket derivation into URL creation. * Bump Spark client version to 0.16.0-demo-11 in build.sbt * Revert Spark client version to 0.16.0-demo-10 in build.sbt and simplify S3 URL derivation in Makefile by removing redundant bucket resolution. * Bump Spark client version to 0.16.0-demo-12 in build.sbt * Update S3 bucket and host details; rename Spark client and bump version to 0.16.0-demo-1. * Update S3 bucket and host references to use benel-public-test. * Replace S3 upload logic with atomic safe upload using `If-None-Match`, bump Spark client version to 0.16.0-demo-14, and add build-time dependencies for AWS SDK. * Add placeholder for safe S3 upload logic using `If-None-Match`. * Remove commented placeholder for safe S3 upload logic in build.sbt. * Refactor safe S3 upload logic to use `RequestOverrideConfiguration` for `If-None-Match`. * Remove redundant `contentType` specification from S3 upload request in build.sbt. * Simplify `safeS3Upload` task definition and fix incorrect import in build.sbt. * Move AWS SDK dependencies to plugins.sbt for build-time use and add `aws-core` dependency. * Replace `safeS3Upload` with `s3PutIfAbsent` using AWS CLI for atomic S3 uploads. * No changes made; reformat file to ensure proper newline at EOF. * Bump Spark client version to 0.16.0-demo-15. * Refactor `s3PutIfAbsent` to improve error handling and include region specification in S3 upload logic. * Remove redundant assembly trigger from `s3PutIfAbsent` definition in build.sbt. * Refactor `s3PutIfAbsent` to simplify variable naming and improve error handling. * Bump Spark client version to 0.16.0-demo-16. * Enhance `s3PutIfAbsent` with dynamic S3 host, improved URL handling, and success logging. * Refactor `s3PutIfAbsent` to extract region as a variable for better maintainability. * Add `sys.process` import to `s3PutIfAbsent` task in build.sbt * Simplify `s3PutIfAbsent` by removing unused `s3Upload` settings and refactoring bucket URL handling. * Refactor `s3PutIfAbsent` error handling by removing redundant success logging. * Update `s3PutIfAbsent` URL to use S3-standard format for bucket URLs. * Bump Spark client version to 0.16.0-demo-17. * Add success logging for `s3PutIfAbsent` uploads in build.sbt. * Bump Spark client version to 0.16.0-demo-19. * Bump Spark client version to 0.16.0-demo-20 and enable public-read ACL for S3 uploads. * commit * commit * Improve `s3PutIfAbsent` error handling: surface AWS CLI errors, refine artifact existence checks, and update bucket target. * Refine `s3PutIfAbsent`: remove unused variable, adjust comments, and standardize S3 URL formatting in logs. * Update Spark client README: add publishing requirements and flow details for maintainers * Refactor Spark client S3 upload: replace `s3PutIfAbsent` with `s3Upload`, update related Makefile and build logic, and remove outdated README publishing details. * Update `s3Upload` task description for clarity in Spark client build logic * Make `publishBucket` configurable in Spark client build and improve `s3Upload` error handling. * Update `publish-scala` description in Makefile and refine S3 upload logic in Spark client build * Remove unused `sbts3` plugin from Spark client build configuration. * Make `S3_PUBLISH_BUCKET` configurable in workflow and pass it to `publish-scala` task via Makefile. * Improve error handling in `s3Upload`: enhance S3 failure message with bucket and key details. * Make `PUBLISH_BUCKET_FLAG` optional in Makefile and improve S3 upload logging in Spark client * Refine `s3Upload` in Spark client build: simplify logging and remove redundant error handling. * Remove conditional `PUBLISH_BUCKET_FLAG` logic from Makefile and improve `publishBucket` handling in Spark client build
Closes #9387
This PR strengthens the deployment workflow to prevent overwriting released Spark client jars
on the treeverse-clients-us-east bucket.
Changes
s3PutIfAbsent
) that usesIf-None-Match: *
, ensuring S3 rejects uploads if the object already exists.publish-scala
to runs3PutIfAbsent
.aws s3 cp --acl public-read ...
step.Public readability right now is applied by adding
"--acl","public-read"
flag addition to thes3PutIfAbsent
command, but in the near future it will be enforced at the bucket level, not via ACLs.Testing
As elaborated in the issue discussion.
Verified:
"--acl","public-read"
flag addition in thes3PutIfAbsent
command).Follow ups:
treeverse-clients-us-east bucket
- switchObject Ownership
toBucket owner enforced
(disable acl's), to make objects public via bucket policy."--acl","public-read"
flag addition in thes3PutIfAbsent
command(Remove public acl flag from s3PutIfAbsent task in spark/build.sbt #9512).