Skip to content

Commit e98402e

Browse files
committed
(docs): Improve documentation on running at scale
1 parent a91ded1 commit e98402e

File tree

3 files changed

+493
-52
lines changed

3 files changed

+493
-52
lines changed

README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](http
3434
## Table of contents
3535

3636
- [Quick Start](#quick-start)
37+
- [At Scale](#at-scale)
3738
- [Read The Docs](#read-the-docs)
3839
- [Getting Help](#getting-help)
3940
- [Community Resources](#community-resources)
@@ -96,6 +97,42 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
9697

9798
```
9899

100+
## At scale
101+
AWS SDK for pandas can also run your workflows at scale by leveraging [modin](https://modin.readthedocs.io/en/stable/) and [ray](https://www.ray.io/). Both projects aim to speed up data workloads (pandas workloads in particular in the case of modin) by distributing processing over a cluster of workers.
102+
103+
### Installation
104+
```
105+
pip install "awswrangler[modin,ray]==3.0.0b3"
106+
```
107+
108+
As a result existing scripts can run on larger datasets with no code rewrite. Supported APIs are parallelized across cores on a single machine or across multiple nodes on a cluster in the cloud.
109+
110+
### Supported APIs
111+
112+
<p align="center">
113+
114+
| Service | API | Implementation |
115+
|-------------------|:------------------------------------------------------------------------------------:|:---------------:|
116+
| `S3` | `read_parquet` | 🅿️ |
117+
| | `read_csv` ||
118+
| | `read_json` ||
119+
| | `read_fwf` ||
120+
| | `to_parquet` | 🅿️ |
121+
| | `to_csv` ||
122+
| | `to_json` ||
123+
| | `select_query` ||
124+
| | `delete` ||
125+
| | `copy` ||
126+
| | `wait` ||
127+
| `Redshift` | `read_sql_query` ||
128+
| | `to_sql` ||
129+
| `Athena` | `read_sql_query` ||
130+
| | `unload` ||
131+
| `LakeFormation` | `read_sql_query` ||
132+
</p>
133+
134+
🅿️: stands for partial (i.e. some input arguments might not be supported)
135+
99136
## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)
100137

101138
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.0.0b3/what.html)

0 commit comments

Comments
 (0)