-
Notifications
You must be signed in to change notification settings - Fork 716
Distributed s3 delete objects #1474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
af3ab20
first draft of delete_objects (distributed)
malachi-constant da502a3
removing concurrent function, potentially not needed..
malachi-constant ca197fb
flake8
malachi-constant 717e321
Merge branch 'release-3.0.0' of github.com:awslabs/aws-data-wrangler …
malachi-constant 885a15c
Fixing fixed iterable arg
malachi-constant 8ef6858
restoring test script
malachi-constant fd9ad26
Fixing typing
malachi-constant d71b671
remove retry logic, redundant with botocore retry
malachi-constant 87a92e2
Module name
malachi-constant babf835
Refactoring _delete_objects
malachi-constant 2e39c0c
ray get added
malachi-constant 40da5fb
updating load tests with configuration and s3 delete test
malachi-constant 4fc3dd7
reverting isort bad update
malachi-constant 0ce008b
reverting isort bad update
malachi-constant abb22e6
changing chunk size
malachi-constant 13308c5
typing
malachi-constant 9fd3f4c
pylint and test count
malachi-constant 9b85b44
adding region to conftest
malachi-constant 0affa4f
changing chunk size
malachi-constant 9548826
updating load test
malachi-constant b1995de
flake8
malachi-constant ccdaf0e
adding ExecutionTime context manager for benchmarking load tests
malachi-constant 71a6534
updating benchmark for s3 delete
malachi-constant File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
import random | ||
malachi-constant marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from datetime import datetime | ||
from timeit import default_timer as timer | ||
from typing import Iterator | ||
|
||
import boto3 | ||
import botocore.exceptions | ||
|
||
import awswrangler as wr | ||
from awswrangler._utils import try_it | ||
|
||
CFN_VALID_STATUS = ["CREATE_COMPLETE", "ROLLBACK_COMPLETE", "UPDATE_COMPLETE", "UPDATE_ROLLBACK_COMPLETE"] | ||
|
||
|
||
class ExecutionTimer: | ||
def __init__(self, msg="elapsed time"): | ||
self.msg = msg | ||
|
||
def __enter__(self): | ||
self.before = timer() | ||
return self | ||
|
||
def __exit__(self, type, value, traceback): | ||
self.elapsed_time = round((timer() - self.before), 3) | ||
print(f"{self.msg}: {self.elapsed_time:.3f} sec") | ||
return None | ||
|
||
|
||
def extract_cloudformation_outputs(): | ||
outputs = {} | ||
client = boto3.client("cloudformation") | ||
response = try_it(client.describe_stacks, botocore.exceptions.ClientError, max_num_tries=5) | ||
for stack in response.get("Stacks"): | ||
if ( | ||
stack["StackName"] | ||
in ["aws-data-wrangler-base", "aws-data-wrangler-databases", "aws-data-wrangler-opensearch"] | ||
) and (stack["StackStatus"] in CFN_VALID_STATUS): | ||
for output in stack.get("Outputs"): | ||
outputs[output.get("OutputKey")] = output.get("OutputValue") | ||
return outputs | ||
|
||
|
||
def get_time_str_with_random_suffix() -> str: | ||
time_str = datetime.utcnow().strftime("%Y%m%d%H%M%S%f") | ||
return f"{time_str}_{random.randrange(16**6):06x}" | ||
|
||
|
||
def path_generator(bucket: str) -> Iterator[str]: | ||
s3_path = f"s3://{bucket}/{get_time_str_with_random_suffix()}/" | ||
print(f"S3 Path: {s3_path}") | ||
objs = wr.s3.list_objects(s3_path) | ||
wr.s3.delete_objects(path=objs) | ||
yield s3_path | ||
objs = wr.s3.list_objects(s3_path) | ||
wr.s3.delete_objects(path=objs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
import pytest # type: ignore | ||
|
||
from ._utils import extract_cloudformation_outputs, path_generator | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def cloudformation_outputs(): | ||
return extract_cloudformation_outputs() | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def region(cloudformation_outputs): | ||
return cloudformation_outputs["Region"] | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def bucket(cloudformation_outputs): | ||
return cloudformation_outputs["BucketName"] | ||
|
||
|
||
@pytest.fixture(scope="function") | ||
def path(bucket): | ||
yield from path_generator(bucket) | ||
|
||
|
||
@pytest.fixture(scope="function") | ||
def path2(bucket): | ||
yield from path_generator(bucket) | ||
|
||
|
||
@pytest.fixture(scope="function") | ||
def path3(bucket): | ||
yield from path_generator(bucket) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
import pandas as pd | ||
import pytest | ||
|
||
import awswrangler as wr | ||
|
||
from ._utils import ExecutionTimer | ||
|
||
|
||
@pytest.mark.repeat(1) | ||
@pytest.mark.parametrize("benchmark_time", [150]) | ||
def test_s3_select(benchmark_time): | ||
|
||
path = "s3://ursa-labs-taxi-data/2018/1*.parquet" | ||
with ExecutionTimer("elapsed time of wr.s3.select_query()") as timer: | ||
wr.s3.select_query( | ||
sql="SELECT * FROM s3object", | ||
path=path, | ||
input_serialization="Parquet", | ||
input_serialization_params={}, | ||
scan_range_chunk_size=16 * 1024 * 1024, | ||
) | ||
|
||
assert timer.elapsed_time < benchmark_time | ||
|
||
|
||
@pytest.mark.parametrize("benchmark_time", [5]) | ||
def test_s3_delete_objects(path, path2, benchmark_time): | ||
df = pd.DataFrame({"id": [1, 2, 3]}) | ||
objects_per_bucket = 505 | ||
paths1 = [f"{path}delete-test{i}.json" for i in range(objects_per_bucket)] | ||
paths2 = [f"{path2}delete-test{i}.json" for i in range(objects_per_bucket)] | ||
paths = paths1 + paths2 | ||
for path in paths: | ||
wr.s3.to_json(df, path) | ||
with ExecutionTimer("elapsed time of wr.s3.delete_objects()") as timer: | ||
wr.s3.delete_objects(path=paths) | ||
assert timer.elapsed_time < benchmark_time | ||
assert len(wr.s3.list_objects(f"{path}delete-test*")) == 0 | ||
assert len(wr.s3.list_objects(f"{path2}delete-test*")) == 0 | ||
malachi-constant marked this conversation as resolved.
Show resolved
Hide resolved
|
This file was deleted.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.