Add S3 guard to prevent publishing existing jars #9492

Ben-El · 2025-09-04T19:28:10Z

This PR strengthens the deployment workflow to prevent overwriting released Spark client jars
on the treeverse-clients-us-east bucket.

Changes

Added a safe publish step (s3PutIfAbsent) that uses If-None-Match: *, ensuring S3 rejects uploads if the object already exists.
Updated publish-scala to run s3PutIfAbsent.
Removed the non-performant recursive aws s3 cp --acl public-read ... step.
Updated readme (clients/spark/README.md).

Public readability right now is applied by adding "--acl","public-read" flag addition to the s3PutIfAbsent command, but in the near future it will be enforced at the bucket level, not via ACLs.

Testing

As elaborated in the issue discussion.

Verified:

New jars upload successfully.
Re-publishing an existing version failed as expected.
Jars remain publicly accessible (because of the "--acl","public-read" flag addition in the s3PutIfAbsent command).

Follow ups:

In S3, on treeverse-clients-us-east bucket - switch Object Ownership to Bucket owner enforced (disable acl's), to make objects public via bucket policy.
Remove the "--acl","public-read" flag addition in the s3PutIfAbsent command
(Remove public acl flag from s3PutIfAbsent task in spark/build.sbt #9512).

…ration

… mapping in Makefile

…ndling

…and robustness

…oved accuracy and error handling in Makefile

…n, additional debug logs, and stricter error handling

…ivation logic, fallback handling, and additional debug logs

…d key derivation processes and improving error handling.

… debug logs and simplifying host, bucket, and key derivation processes.

… and enhancing error messaging

…standardize S3 URL formatting in logs.

nopcoder

if we run aws s3 cli to perform the upload, we need to update the requirements from the env / readme.
also, sbt as a build tool should be able to run command, we can use it - without adding code to compile in order to run a command.

…ils for maintainers

clients/spark/build.sbt

nopcoder · 2025-09-16T09:18:16Z

clients/spark/build.sbt

+  val region = "us-east-1"
+
+  val cmd = Seq(
+    "aws","s3api","put-object",


search for non 's3api' command - aws s3 is preferred.

@nopcoder

Using aws s3 cp instead of s3api put-object is possible with some "overhead tweak", but there are trade-offs:

aws s3 cp can perform multipart uploads, however, it does not support the If-None-Match: "*" condition. That means you cannot guarantee an atomic “create-if-absent” operation with aws s3 cp.

If we switch to aws s3 cp, the only workaround would be something like -

checking first: aws s3api head-object --bucket "$bucket" --key "$key" ....

And then uploading: aws s3 cp "$jar" "s3://$bucket/$key"....

But this is not atomic.

For our use case, since the assembly jar is always < 5GB, I think s3api put-object would fit nicely (it supports single-object uploads up to 5GB).

If one day we publish artifacts larger than 5GB, we would need multipart upload (and lose the atomic “no-overwrite” guarantee).

clients/spark/build.sbt

clients/spark/README.md

Makefile

…ad`, update related Makefile and build logic, and remove outdated README publishing details.

…logic

…s3Upload` error handling.

…ogic in Spark client build

…lish-scala` task via Makefile.

… bucket and key details.

Makefile

clients/spark/build.sbt

… logging in Spark client

…redundant error handling.

clients/spark/build.sbt

Makefile

…ove `publishBucket` handling in Spark client build

nopcoder

lgtm

Isan-Rivkin

Please release experimental version from branch after merge for final sanity check?
Verify its accesible from S3 with no auth or aws cli, just curl and that's the jar works!

Ben-El · 2025-09-16T14:05:22Z

Please release experimental version from branch after merge for final sanity check? Verify its accesible from S3 with no auth or aws cli, just curl and that's the jar works!

Works fine !

* Update AWS S3 configurations and add debug step in publish workflow * Revert project version to 0.16.0-demo-9 in Spark client build configuration * Add S3 overwrite guard to Scala publish process in Makefile * Remove AWS debug step from publish workflow * Update S3 bucket name in Makefile * Refine S3 overwrite guard logic and update handling of bucket and key mapping in Makefile * Simplify S3 overwrite guard logic in Makefile and streamline error handling * Refactor S3 overwrite guard logic in Makefile to improve readability and robustness * Enhance S3 overwrite guard to use HTTP status check via curl for improved accuracy and error handling in Makefile * Improve S3 overwrite guard in Makefile with enhanced bucket extraction, additional debug logs, and stricter error handling * Remove redundant comment from Makefile's S3 bucket extraction logic * Enhance S3 overwrite guard in Makefile with improved host and key derivation logic, fallback handling, and additional debug logs * Simplify S3 overwrite guard logic in Makefile by streamlining host and key derivation processes and improving error handling. * Streamline S3 overwrite guard logic in Makefile by removing redundant debug logs and simplifying host, bucket, and key derivation processes. * Improve S3 overwrite guard in Makefile by refining HTTP status checks and enhancing error messaging * Bump Spark client version to 0.16.0-demo-10 in build.sbt * Simplify S3 overwrite guard logic in Makefile by refining host and bucket derivation steps. * Enhance S3 overwrite guard in Makefile by refining key derivation logic with name, version, and jar details * Refine S3 overwrite guard in Makefile by enhancing bucket and key extraction logic with stricter validations and improved error handling * Refine S3 overwrite guard in Makefile by simplifying bucket and key derivation logic using mappings and removing redundant validations. * Refine S3 overwrite guard in Makefile by adding sbt flags for no colors and disabling supershell. * Simplify S3 overwrite guard in Makefile by removing redundant sbt flags for no colors and supershell. * Simplify S3 overwrite guard in Makefile by refining host, bucket, and key derivation logic and consolidating HTTP status checks. * Simplify S3 overwrite guard in Makefile by inlining bucket derivation into URL creation. * Bump Spark client version to 0.16.0-demo-11 in build.sbt * Revert Spark client version to 0.16.0-demo-10 in build.sbt and simplify S3 URL derivation in Makefile by removing redundant bucket resolution. * Bump Spark client version to 0.16.0-demo-12 in build.sbt * Update S3 bucket and host details; rename Spark client and bump version to 0.16.0-demo-1. * Update S3 bucket and host references to use benel-public-test. * Replace S3 upload logic with atomic safe upload using `If-None-Match`, bump Spark client version to 0.16.0-demo-14, and add build-time dependencies for AWS SDK. * Add placeholder for safe S3 upload logic using `If-None-Match`. * Remove commented placeholder for safe S3 upload logic in build.sbt. * Refactor safe S3 upload logic to use `RequestOverrideConfiguration` for `If-None-Match`. * Remove redundant `contentType` specification from S3 upload request in build.sbt. * Simplify `safeS3Upload` task definition and fix incorrect import in build.sbt. * Move AWS SDK dependencies to plugins.sbt for build-time use and add `aws-core` dependency. * Replace `safeS3Upload` with `s3PutIfAbsent` using AWS CLI for atomic S3 uploads. * No changes made; reformat file to ensure proper newline at EOF. * Bump Spark client version to 0.16.0-demo-15. * Refactor `s3PutIfAbsent` to improve error handling and include region specification in S3 upload logic. * Remove redundant assembly trigger from `s3PutIfAbsent` definition in build.sbt. * Refactor `s3PutIfAbsent` to simplify variable naming and improve error handling. * Bump Spark client version to 0.16.0-demo-16. * Enhance `s3PutIfAbsent` with dynamic S3 host, improved URL handling, and success logging. * Refactor `s3PutIfAbsent` to extract region as a variable for better maintainability. * Add `sys.process` import to `s3PutIfAbsent` task in build.sbt * Simplify `s3PutIfAbsent` by removing unused `s3Upload` settings and refactoring bucket URL handling. * Refactor `s3PutIfAbsent` error handling by removing redundant success logging. * Update `s3PutIfAbsent` URL to use S3-standard format for bucket URLs. * Bump Spark client version to 0.16.0-demo-17. * Add success logging for `s3PutIfAbsent` uploads in build.sbt. * Bump Spark client version to 0.16.0-demo-19. * Bump Spark client version to 0.16.0-demo-20 and enable public-read ACL for S3 uploads. * commit * commit * Improve `s3PutIfAbsent` error handling: surface AWS CLI errors, refine artifact existence checks, and update bucket target. * Refine `s3PutIfAbsent`: remove unused variable, adjust comments, and standardize S3 URL formatting in logs. * Update Spark client README: add publishing requirements and flow details for maintainers * Refactor Spark client S3 upload: replace `s3PutIfAbsent` with `s3Upload`, update related Makefile and build logic, and remove outdated README publishing details. * Update `s3Upload` task description for clarity in Spark client build logic * Make `publishBucket` configurable in Spark client build and improve `s3Upload` error handling. * Update `publish-scala` description in Makefile and refine S3 upload logic in Spark client build * Remove unused `sbts3` plugin from Spark client build configuration. * Make `S3_PUBLISH_BUCKET` configurable in workflow and pass it to `publish-scala` task via Makefile. * Improve error handling in `s3Upload`: enhance S3 failure message with bucket and key details. * Make `PUBLISH_BUCKET_FLAG` optional in Makefile and improve S3 upload logging in Spark client * Refine `s3Upload` in Spark client build: simplify logging and remove redundant error handling. * Remove conditional `PUBLISH_BUCKET_FLAG` logic from Makefile and improve `publishBucket` handling in Spark client build

Update AWS S3 configurations and add debug step in publish workflow

ab8cffc

Ben-El marked this pull request as draft September 4, 2025 19:28

Ben-El added exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached labels Sep 4, 2025

Ben-El added 2 commits September 7, 2025 14:41

Revert project version to 0.16.0-demo-9 in Spark client build configu…

81077ea

…ration

Merge branch 'master' into test-public-bucket

148b4dd

Ben-El temporarily deployed to Treeverse signing September 7, 2025 11:44 — with GitHub Actions Inactive

Add S3 overwrite guard to Scala publish process in Makefile

7bfa466

Ben-El changed the title ~~Update AWS S3 configurations and add debug step in publish workflow~~ Add S3 guard to prevent publishing existing JARs Sep 7, 2025

Ben-El added 2 commits September 7, 2025 15:43

Remove AWS debug step from publish workflow

d0fa21a

Update S3 bucket name in Makefile

8f4599c

Ben-El temporarily deployed to Treeverse signing September 7, 2025 13:06 — with GitHub Actions Inactive

Refine S3 overwrite guard logic and update handling of bucket and key…

d00cbe5

… mapping in Makefile

Ben-El temporarily deployed to Treeverse signing September 7, 2025 14:28 — with GitHub Actions Inactive

Ben-El temporarily deployed to Treeverse signing September 7, 2025 14:33 — with GitHub Actions Inactive

Simplify S3 overwrite guard logic in Makefile and streamline error ha…

3926fcb

…ndling

Ben-El had a problem deploying to Treeverse signing September 7, 2025 14:39 — with GitHub Actions Failure

Refactor S3 overwrite guard logic in Makefile to improve readability …

6f98a7a

…and robustness

Ben-El temporarily deployed to Treeverse signing September 7, 2025 14:43 — with GitHub Actions Inactive

Enhance S3 overwrite guard to use HTTP status check via curl for impr…

4ca889e

…oved accuracy and error handling in Makefile

Ben-El had a problem deploying to Treeverse signing September 7, 2025 14:58 — with GitHub Actions Failure

Ben-El added 2 commits September 7, 2025 18:03

Improve S3 overwrite guard in Makefile with enhanced bucket extractio…

3ece0cb

…n, additional debug logs, and stricter error handling

Remove redundant comment from Makefile's S3 bucket extraction logic

bc0ae95

Ben-El had a problem deploying to Treeverse signing September 7, 2025 15:05 — with GitHub Actions Failure

Enhance S3 overwrite guard in Makefile with improved host and key der…

008897f

…ivation logic, fallback handling, and additional debug logs

Ben-El had a problem deploying to Treeverse signing September 7, 2025 15:09 — with GitHub Actions Failure

Simplify S3 overwrite guard logic in Makefile by streamlining host an…

e12c406

…d key derivation processes and improving error handling.

Ben-El had a problem deploying to Treeverse signing September 7, 2025 15:12 — with GitHub Actions Failure

Ben-El added 2 commits September 7, 2025 18:19

Streamline S3 overwrite guard logic in Makefile by removing redundant…

cc3d82f

… debug logs and simplifying host, bucket, and key derivation processes.

Improve S3 overwrite guard in Makefile by refining HTTP status checks…

1275771

… and enhancing error messaging

Ben-El added 2 commits September 14, 2025 18:40

Refine s3PutIfAbsent: remove unused variable, adjust comments, and …

b69c5c5

…standardize S3 URL formatting in logs.

Merge branch 'master' into test-public-bucket

95b90d3

nopcoder requested changes Sep 15, 2025

View reviewed changes

Update Spark client README: add publishing requirements and flow deta…

78aab3a

…ils for maintainers

Ben-El requested a review from nopcoder September 15, 2025 17:36

Merge branch 'master' into test-public-bucket

6ecc141

Ben-El mentioned this pull request Sep 16, 2025

Remove public acl flag from s3PutIfAbsent task in spark/build.sbt #9512

Closed

nopcoder requested changes Sep 16, 2025

View reviewed changes

Ben-El added 4 commits September 16, 2025 12:28

Refactor Spark client S3 upload: replace s3PutIfAbsent with `s3Uplo…

071b9a5

…ad`, update related Makefile and build logic, and remove outdated README publishing details.

Update s3Upload task description for clarity in Spark client build …

d71364a

…logic

Make publishBucket configurable in Spark client build and improve `…

d484d92

…s3Upload` error handling.

Update publish-scala description in Makefile and refine S3 upload l…

758a16d

…ogic in Spark client build

Ben-El requested a review from nopcoder September 16, 2025 10:08

Ben-El added 3 commits September 16, 2025 13:14

Remove unused sbts3 plugin from Spark client build configuration.

da22e14

Make S3_PUBLISH_BUCKET configurable in workflow and pass it to `pub…

6baa8cb

…lish-scala` task via Makefile.

Improve error handling in s3Upload: enhance S3 failure message with…

8c6c79f

… bucket and key details.

nopcoder requested changes Sep 16, 2025

View reviewed changes

Makefile Show resolved Hide resolved

clients/spark/build.sbt Outdated Show resolved Hide resolved

Ben-El added 2 commits September 16, 2025 14:34

Make PUBLISH_BUCKET_FLAG optional in Makefile and improve S3 upload…

59fc0c9

… logging in Spark client

Refine s3Upload in Spark client build: simplify logging and remove …

2b61a1d

…redundant error handling.

Ben-El requested a review from nopcoder September 16, 2025 11:43

nopcoder requested changes Sep 16, 2025

View reviewed changes

clients/spark/build.sbt Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

Remove conditional PUBLISH_BUCKET_FLAG logic from Makefile and impr…

e9f3fb7

…ove `publishBucket` handling in Spark client build

Ben-El requested a review from nopcoder September 16, 2025 11:59

nopcoder approved these changes Sep 16, 2025

View reviewed changes

Isan-Rivkin approved these changes Sep 16, 2025

View reviewed changes

Ben-El merged commit 7e32167 into master Sep 16, 2025
40 checks passed

Ben-El deleted the test-public-bucket branch September 16, 2025 13:54

Ben-El mentioned this pull request Sep 18, 2025

Refactor S3 upload logic - replace AWS CLI with SDK #9534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add S3 guard to prevent publishing existing jars #9492

Add S3 guard to prevent publishing existing jars #9492

Ben-El commented Sep 4, 2025 •

edited

Loading

Uh oh!

nopcoder left a comment

Uh oh!

Uh oh!

nopcoder Sep 16, 2025

Uh oh!

Ben-El Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nopcoder left a comment

Uh oh!

Isan-Rivkin left a comment

Uh oh!

Uh oh!

Ben-El commented Sep 16, 2025

Uh oh!

Uh oh!

Add S3 guard to prevent publishing existing jars #9492

Add S3 guard to prevent publishing existing jars #9492

Conversation

Ben-El commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Testing

Follow ups:

Uh oh!

nopcoder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nopcoder Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Ben-El Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nopcoder left a comment

Choose a reason for hiding this comment

Uh oh!

Isan-Rivkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ben-El commented Sep 16, 2025

Uh oh!

Uh oh!

Ben-El commented Sep 4, 2025 •

edited

Loading

Ben-El Sep 16, 2025 •

edited

Loading