read_parquet function takes up a lot of memory even before it returns the iterable object

### Describe the bug

The documentation of [read_parquet](https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.read_parquet.html#awswrangler.s3.read_parquet) function suggests that using the 'chunked' argument makes the function memory friendly as it will return an iterable of dataframes instead of a regular dataframe. However, when tested with a 500MB parquet file, with chunked = 1, the function takes up more than 7GB memory even before returning the iterable object. That indicates the function is doing something underneath (possibly loading the file in memory) before being able to give back a streamable object. 

If this an expected behavior, then the function unfortunately cannot be considered as memory-friendly as it ends up taking up a lot of memory, and the documentation should explicitly specify that so that the users would know what to expect. If it is not the expected behavior, then it is possibly a bug. 

Sharing our code below:

```

def get_table_chunks_from_s3_file(app_configs: dict, sqs_values_dict: dict):

    bucket = sqs_values_dict["BucketName"]
    key = sqs_values_dict["ObjectKey"]
    boto3_session = app_configs["boto3_session"]

    file_path = "s3://" + bucket + "/" + key

    # Below function call takes up high memory before being able to return the dataframes object.
    dataframes = wr.s3.read_parquet(
        path=file_path, chunked=1, boto3_session=boto3_session
    )

    for dataframe in dataframes:
        yield pyarrow.table(dataframe)
```

Note that in the above, I have added a comment to share which part of the code takes up a lot of memory.


### How to Reproduce

Run the read_parquet function for a relatively large parquet file in S3 and check how much memory it consumes (through a memory profiler) before giving back an iterable object.


### Expected behavior

The function (as the documentation suggests) should not be taking up so much memory while trying to return an iterable of dataframes.

### Your project

_No response_

### Screenshots

_No response_

### OS

Ubuntu 22.04

### Python version

3.10.12

### AWS SDK for pandas version

3.9.1

### Additional context

Support Case ID: 172918156100319

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

read_parquet function takes up a lot of memory even before it returns the iterable object #3010

Describe the bug

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

read_parquet function takes up a lot of memory even before it returns the iterable object #3010

Description

Describe the bug

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions