Skip to content

Commit 4ca3676

Browse files
committed
[doc] Add workflow of the AutoPytorch
1 parent 1e06cce commit 4ca3676

File tree

2 files changed

+29
-2
lines changed

2 files changed

+29
-2
lines changed

README.md

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,33 @@ We plan to enable image data and time-series data.
99

1010
Find the documentation [here](https://automl.github.io/Auto-PyTorch/development)
1111

12+
## Workflow
13+
14+
The rough description of the workflow of Auto-Pytorch is drawn in the following figure.
15+
16+
<img src="figs/apt_workflow.png" width="500">
17+
18+
In the figure, **Data** is provided by user and
19+
**Portfolio** is a set of configurations of neural networks on diverse
20+
datasets.
21+
The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
22+
This portfolio is used to warm-start the optimization of SMAC.
23+
In other words, we evaluate the portfolio on a provided data as initial configurations.
24+
Then API starts the following procedures:
25+
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
26+
2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
27+
3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
28+
4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
29+
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
30+
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
31+
c. Update the observations by obtained results\
32+
d. Repeat a. -- c. until the budget runs out
33+
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
34+
35+
*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
36+
37+
*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and
38+
(which specifies the choice of components in each step and their corresponding hyperparameters.
1239

1340
## Installation
1441

@@ -25,8 +52,8 @@ We recommend using Anaconda for developing as follows:
2552
git submodule update --init --recursive
2653

2754
# Create the environment
28-
conda create -n autopytorch python=3.8
29-
conda activate autopytorch
55+
conda create -n auto-pytorch python=3.8
56+
conda activate auto-pytorch
3057
conda install swig
3158
cat requirements.txt | xargs -n 1 -L 1 pip install
3259
python setup.py install

figs/apt_workflow.png

120 KB
Loading

0 commit comments

Comments
 (0)