Skip to content

Commit ae354c8

Browse files
authored
Merge branch 'development' into feat-epoch-wise-LR-scheduler
2 parents 017595d + 999f3c3 commit ae354c8

File tree

135 files changed

+2174
-1387
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

135 files changed

+2174
-1387
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Tests
2+
3+
on:
4+
schedule:
5+
# Every Truesday at 7AM UTC
6+
# TODO teporary set to every day just for the PR
7+
#- cron: '0 07 * * 2'
8+
- cron: '0 07 * * *'
9+
10+
11+
jobs:
12+
ubuntu:
13+
14+
runs-on: ubuntu-latest
15+
strategy:
16+
matrix:
17+
python-version: [3.8]
18+
fail-fast: false
19+
20+
steps:
21+
- uses: actions/checkout@v2
22+
with:
23+
ref: development
24+
- name: Setup Python ${{ matrix.python-version }}
25+
uses: actions/setup-python@v2
26+
with:
27+
python-version: ${{ matrix.python-version }}
28+
- name: Install test dependencies
29+
run: |
30+
git submodule update --init --recursive
31+
python -m pip install --upgrade pip
32+
pip install -e .[test]
33+
- name: Run tests
34+
run: |
35+
python -m pytest --durations=200 cicd/test_preselected_configs.py -vs

.github/workflows/pytest.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
runs-on: ubuntu-latest
99
strategy:
1010
matrix:
11-
python-version: [3.6, 3.7, 3.8]
11+
python-version: [3.7, 3.8]
1212
include:
1313
- python-version: 3.8
1414
code-cov: true
@@ -52,4 +52,4 @@ jobs:
5252
uses: codecov/codecov-action@v1
5353
with:
5454
fail_ci_if_error: true
55-
verbose: true
55+
verbose: true

.pre-commit-config.yaml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,25 @@ repos:
33
rev: v0.761
44
hooks:
55
- id: mypy
6-
args: [--show-error-codes]
7-
name: mypy AutoPyTorch
6+
args: [--show-error-codes,
7+
--warn-redundant-casts,
8+
--warn-return-any,
9+
--warn-unreachable,
10+
]
811
files: autoPyTorch/.*
12+
exclude: autoPyTorch/ensemble/
913
- repo: https://gitlab.com/pycqa/flake8
1014
rev: 3.8.3
1115
hooks:
1216
- id: flake8
13-
name: flake8 AutoPyTorch
14-
files: autoPyTorch/.*
1517
additional_dependencies:
1618
- flake8-print==3.1.4
1719
- flake8-import-order
20+
name: flake8 autoPyTorch
21+
files: autoPyTorch/.*
1822
- id: flake8
19-
name: flake8 tests
20-
files: test/.*
2123
additional_dependencies:
2224
- flake8-print==3.1.4
2325
- flake8-import-order
26+
name: flake8 test
27+
files: test/.*

MANIFEST.in

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
include requirements.txt
22
include autoPyTorch/utils/logging.yaml
33
include autoPyTorch/configs/default_pipeline_options.json
4-
include autoPyTorch/pipeline/components/setup/traditional_ml/classifier_configs/catboost.json
5-
include autoPyTorch/pipeline/components/setup/traditional_ml/classifier_configs/rotation_forest.json
6-
include autoPyTorch/pipeline/components/setup/traditional_ml/classifier_configs/random_forest.json
7-
include autoPyTorch/pipeline/components/setup/traditional_ml/classifier_configs/knn.json
8-
include autoPyTorch/pipeline/components/setup/traditional_ml/classifier_configs/svm.json
9-
include autoPyTorch/pipeline/components/setup/traditional_ml/classifier_configs/extra_trees.json
10-
include autoPyTorch/pipeline/components/setup/traditional_ml/classifier_configs/lgb.json
4+
include autoPyTorch/configs/greedy_portfolio.json
5+
include autoPyTorch/pipeline/components/setup/traditional_ml/estimator_configs/catboost.json
6+
include autoPyTorch/pipeline/components/setup/traditional_ml/estimator_configs/rotation_forest.json
7+
include autoPyTorch/pipeline/components/setup/traditional_ml/estimator_configs/random_forest.json
8+
include autoPyTorch/pipeline/components/setup/traditional_ml/estimator_configs/knn.json
9+
include autoPyTorch/pipeline/components/setup/traditional_ml/estimator_configs/svm.json
10+
include autoPyTorch/pipeline/components/setup/traditional_ml/estimator_configs/extra_trees.json
11+
include autoPyTorch/pipeline/components/setup/traditional_ml/estimator_configs/lgb.json

README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,7 @@ git submodule update --init --recursive
2727
# Create the environment
2828
conda create -n autopytorch python=3.8
2929
conda activate autopytorch
30-
For Linux:
31-
conda install gxx_linux-64 gcc_linux-64 swig
32-
For mac:
33-
conda install -c conda-forge clang_osx-64 clangxx_osx-64
34-
conda install -c anaconda swig
30+
conda install swig
3531
cat requirements.txt | xargs -n 1 -L 1 pip install
3632
python setup.py install
3733

autoPyTorch/api/base_task.py

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@
1212
import unittest.mock
1313
import warnings
1414
from abc import abstractmethod
15-
from typing import Any, Callable, Dict, List, Optional, Tuple, Union, cast
15+
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
1616

1717
from ConfigSpace.configuration_space import Configuration, ConfigurationSpace
1818

1919
import dask
20+
import dask.distributed
2021

2122
import joblib
2223

@@ -38,13 +39,12 @@
3839
from autoPyTorch.datasets.base_dataset import BaseDataset
3940
from autoPyTorch.datasets.resampling_strategy import CrossValTypes, HoldoutValTypes
4041
from autoPyTorch.ensemble.ensemble_builder import EnsembleBuilderManager
41-
from autoPyTorch.ensemble.ensemble_selection import EnsembleSelection
4242
from autoPyTorch.ensemble.singlebest_ensemble import SingleBest
4343
from autoPyTorch.evaluation.abstract_evaluator import fit_and_suppress_warnings
4444
from autoPyTorch.evaluation.tae import ExecuteTaFuncWithQueue, get_cost_of_crash
4545
from autoPyTorch.optimizer.smbo import AutoMLSMBO
4646
from autoPyTorch.pipeline.base_pipeline import BasePipeline
47-
from autoPyTorch.pipeline.components.setup.traditional_ml.classifier_models import get_available_classifiers
47+
from autoPyTorch.pipeline.components.setup.traditional_ml.traditional_learner import get_available_traditional_learners
4848
from autoPyTorch.pipeline.components.training.metrics.base import autoPyTorchMetric
4949
from autoPyTorch.pipeline.components.training.metrics.utils import calculate_score, get_metrics
5050
from autoPyTorch.utils.common import FitRequirement, replace_string_bool_to_bool
@@ -198,7 +198,7 @@ def __init__(
198198
# examples. Nevertheless, multi-process runs
199199
# have spawn as requirement to reduce the
200200
# possibility of a deadlock
201-
self._dask_client = None
201+
self._dask_client: Optional[dask.distributed.Client] = None
202202
self._multiprocessing_context = 'forkserver'
203203
if self.n_jobs == 1:
204204
self._multiprocessing_context = 'fork'
@@ -590,7 +590,7 @@ def _do_traditional_prediction(self, time_left: int, func_eval_time_limit_secs:
590590
memory_limit = self._memory_limit
591591
if memory_limit is not None:
592592
memory_limit = int(math.ceil(memory_limit))
593-
available_classifiers = get_available_classifiers()
593+
available_classifiers = get_available_traditional_learners()
594594
dask_futures = []
595595

596596
total_number_classifiers = len(available_classifiers)
@@ -711,7 +711,8 @@ def _search(
711711
precision: int = 32,
712712
disable_file_output: List = [],
713713
load_models: bool = True,
714-
portfolio_selection: Optional[str] = None
714+
portfolio_selection: Optional[str] = None,
715+
dask_client: Optional[dask.distributed.Client] = None
715716
) -> 'BaseTask':
716717
"""
717718
Search for the best pipeline configuration for the given dataset.
@@ -838,6 +839,8 @@ def _search(
838839
self._metric = get_metrics(
839840
names=[optimize_metric], dataset_properties=dataset_properties)[0]
840841

842+
self.pipeline_options['optimize_metric'] = optimize_metric
843+
841844
self.search_space = self.get_search_space(dataset)
842845

843846
budget_config: Dict[str, Union[float, str]] = {}
@@ -855,10 +858,11 @@ def _search(
855858
# If no dask client was provided, we create one, so that we can
856859
# start a ensemble process in parallel to smbo optimize
857860
if (
858-
self._dask_client is None and (self.ensemble_size > 0 or self.n_jobs is not None and self.n_jobs > 1)
861+
dask_client is None and (self.ensemble_size > 0 or self.n_jobs > 1)
859862
):
860863
self._create_dask_client()
861864
else:
865+
self._dask_client = dask_client
862866
self._is_dask_client_internally_created = False
863867

864868
# Handle time resource allocation
@@ -892,21 +896,18 @@ def _search(
892896
# ============> Run traditional ml
893897

894898
if enable_traditional_pipeline:
895-
if STRING_TO_TASK_TYPES[self.task_type] in REGRESSION_TASKS:
896-
self._logger.warning("Traditional Pipeline is not enabled for regression. Skipping...")
897-
else:
898-
traditional_task_name = 'runTraditional'
899-
self._stopwatch.start_task(traditional_task_name)
900-
elapsed_time = self._stopwatch.wall_elapsed(self.dataset_name)
901-
# We want time for at least 1 Neural network in SMAC
902-
time_for_traditional = int(
903-
self._time_for_task - elapsed_time - func_eval_time_limit_secs
904-
)
905-
self._do_traditional_prediction(
906-
func_eval_time_limit_secs=func_eval_time_limit_secs,
907-
time_left=time_for_traditional,
908-
)
909-
self._stopwatch.stop_task(traditional_task_name)
899+
traditional_task_name = 'runTraditional'
900+
self._stopwatch.start_task(traditional_task_name)
901+
elapsed_time = self._stopwatch.wall_elapsed(self.dataset_name)
902+
# We want time for at least 1 Neural network in SMAC
903+
time_for_traditional = int(
904+
self._time_for_task - elapsed_time - func_eval_time_limit_secs
905+
)
906+
self._do_traditional_prediction(
907+
func_eval_time_limit_secs=func_eval_time_limit_secs,
908+
time_left=time_for_traditional,
909+
)
910+
self._stopwatch.stop_task(traditional_task_name)
910911

911912
# ============> Starting ensemble
912913
elapsed_time = self._stopwatch.wall_elapsed(self.dataset_name)
@@ -1207,7 +1208,6 @@ def predict(
12071208

12081209
# Mypy assert
12091210
assert self.ensemble_ is not None, "Load models should error out if no ensemble"
1210-
self.ensemble_ = cast(Union[SingleBest, EnsembleSelection], self.ensemble_)
12111211

12121212
if isinstance(self.resampling_strategy, HoldoutValTypes):
12131213
models = self.models_
@@ -1316,15 +1316,17 @@ def get_models_with_weights(self) -> List:
13161316
self._load_models()
13171317

13181318
assert self.ensemble_ is not None
1319-
return self.ensemble_.get_models_with_weights(self.models_)
1319+
models_with_weights: List[Tuple[float, BasePipeline]] = self.ensemble_.get_models_with_weights(self.models_)
1320+
return models_with_weights
13201321

13211322
def show_models(self) -> str:
13221323
df = []
13231324
for weight, model in self.get_models_with_weights():
13241325
representation = model.get_pipeline_representation()
13251326
representation.update({'Weight': weight})
13261327
df.append(representation)
1327-
return pd.DataFrame(df).to_markdown()
1328+
models_markdown: str = pd.DataFrame(df).to_markdown()
1329+
return models_markdown
13281330

13291331
def _print_debug_info_to_log(self) -> None:
13301332
"""

autoPyTorch/api/tabular_regression.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ def search(
106106
budget: Optional[float] = None,
107107
total_walltime_limit: int = 100,
108108
func_eval_time_limit_secs: Optional[int] = None,
109-
enable_traditional_pipeline: bool = False,
109+
enable_traditional_pipeline: bool = True,
110110
memory_limit: Optional[int] = 4096,
111111
smac_scenario_args: Optional[Dict[str, Any]] = None,
112112
get_smac_object_callback: Optional[Callable] = None,
@@ -151,7 +151,7 @@ def search(
151151
total_walltime_limit // 2 to allow enough time to fit
152152
at least 2 individual machine learning algorithms.
153153
Set to np.inf in case no time limit is desired.
154-
enable_traditional_pipeline (bool), (default=False):
154+
enable_traditional_pipeline (bool), (default=True):
155155
Not enabled for regression. This flag is here to comply
156156
with the API.
157157
memory_limit (Optional[int]), (default=4096): Memory
@@ -187,7 +187,11 @@ def search(
187187
configurations, similar to (...herepathtogreedy...).
188188
Additionally, the keyword 'greedy' is supported,
189189
which would use the default portfolio from
190-
`AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`
190+
`AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`.
191+
Although portfolio selection is supported for tabular
192+
regression, the portfolio has been built using
193+
classification datasets. We will update a portfolio
194+
to cover tabular regression datasets.
191195
192196
Returns:
193197
self

autoPyTorch/data/base_target_validator.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,6 @@ def fit(
9595
np.shape(y_test)
9696
))
9797
if isinstance(y_train, pd.DataFrame):
98-
y_train = typing.cast(pd.DataFrame, y_train)
9998
y_test = typing.cast(pd.DataFrame, y_test)
10099
if y_train.columns.tolist() != y_test.columns.tolist():
101100
raise ValueError(

autoPyTorch/data/tabular_feature_validator.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,6 @@ def transform(
145145
X = self.numpy_array_to_pandas(X)
146146

147147
if hasattr(X, "iloc") and not scipy.sparse.issparse(X):
148-
X = typing.cast(pd.DataFrame, X)
149148
if np.any(pd.isnull(X)):
150149
for column in X.columns:
151150
if X[column].isna().all():

autoPyTorch/data/tabular_target_validator.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -194,8 +194,9 @@ def _check_data(
194194
A set of features whose dimensionality and data type is going to be checked
195195
"""
196196

197-
if not isinstance(
198-
y, (np.ndarray, pd.DataFrame, list, pd.Series)) and not scipy.sparse.issparse(y):
197+
if not isinstance(y, (np.ndarray, pd.DataFrame,
198+
typing.List, pd.Series)) \
199+
and not scipy.sparse.issparse(y): # type: ignore[misc]
199200
raise ValueError("AutoPyTorch only supports Numpy arrays, Pandas DataFrames,"
200201
" pd.Series, sparse data and Python Lists as targets, yet, "
201202
"the provided input is of type {}".format(

0 commit comments

Comments
 (0)