You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-2Lines changed: 29 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,33 @@ We plan to enable image data and time-series data.
9
9
10
10
Find the documentation [here](https://automl.github.io/Auto-PyTorch/development)
11
11
12
+
## Workflow
13
+
14
+
The rough description of the workflow of Auto-Pytorch is drawn in the following figure.
15
+
16
+
<imgsrc="figs/apt_workflow.png"width="500">
17
+
18
+
In the figure, **Data** is provided by user and
19
+
**Portfolio** is a set of configurations of neural networks on diverse
20
+
datasets.
21
+
The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
22
+
This portfolio is used to warm-start the optimization of SMAC.
23
+
In other words, we evaluate the portfolio on a provided data as initial configurations.
24
+
Then API starts the following procedures:
25
+
1.**Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
26
+
2.**Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
27
+
3.**Evaluate baselines***1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
28
+
4.**Search by [SMAC](https://github.com/automl/SMAC3)**:\
29
+
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
30
+
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
31
+
c. Update the observations by obtained results\
32
+
d. Repeat a. -- c. until the budget runs out
33
+
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
34
+
35
+
*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
36
+
37
+
*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and
38
+
(which specifies the choice of components in each step and their corresponding hyperparameters.
12
39
13
40
## Installation
14
41
@@ -25,8 +52,8 @@ We recommend using Anaconda for developing as follows:
0 commit comments