Skip to content

Commit 13baad5

Browse files
nicolai86tchatonSherin Thomasziyadsheebaotaj
authored
Add support for custom cloud compute configurations for Flows (#14831)
* use more recent lightning cloud launcher * allow LightningApp to use custom cloud compute for flows * feedback from adrian * adjust other cloud tests * update * update * update commens * Update src/lightning_app/core/app.py Co-authored-by: Sherin Thomas <[email protected]> * Close profiler when `StopIteration` is raised (#14945) * Find last checkpoints on restart (#14907) Co-authored-by: Carlos Mocholí <[email protected]> * Remove unused gcsfs dependency (#14962) * Update hpu mixed precision link (#14974) Signed-off-by: Jerome <[email protected]> * Bump version of fsspec (#14975) fsspec verbump * Fix TPU test CI (#14926) * Fix TPU test CI * +x first * Lite first to uncovert errors faster * Fixes * One more * Simplify XLALauncher wrapping to avoid pickle error * debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug commit successful. Trying local definitions * Require tpu for mock test * ValueError: The number of devices must be either 1 or 8, got 4 instead * Fix mock test * Simplify call, rely on defaults * Skip OSError for now. Maybe upgrading will help * Simplify launch tests, move some to lite * Stricter typing * RuntimeError: Accessing the XLA device before processes have spawned is not allowed. * Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed." This reverts commit f65107e. * Alternative boring solution to the reverted commit * Fix failing test on CUDA machine * Workarounds * Try latest mkl * Revert "Try latest mkl" This reverts commit d06813a. * Wrong exception * xfail * Mypy * Comment change * Spawn launch refactor * Accept that we cannot lazy init now * Fix mypy and launch test failures * The base dockerfile already includes mkl-2022.1.0 - what if we use it? * try a different mkl version * Revert mkl version changes Co-authored-by: awaelchli <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <[email protected]> * Trainer: fix support for non-distributed PyTorch (#14971) * Trainer: fix non-distributed use * Update CHANGELOG * fixes typing errors in rich_progress.py (#14963) * revert default cloud compute rename * allow LightningApp to use custom cloud compute for flows * feedback from adrian * update * resolve merge with master conflict * remove preemptible * update CHANGELOG * add basic flow cloud compute documentation * fix docs build * add missing symlink * try to fix sphinx * another attempt for docs * fix new test Signed-off-by: Jerome <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: Ziyad Sheebaelhamd <[email protected]> Co-authored-by: otaj <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Jerome Anand <[email protected]> Co-authored-by: awaelchli <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Adam J. Stewart <[email protected]> Co-authored-by: DP <[email protected]>
1 parent 53d2c06 commit 13baad5

File tree

7 files changed

+113
-2
lines changed

7 files changed

+113
-2
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
:orphan:
2+
3+
***************************
4+
Customize my Flow resources
5+
***************************
6+
7+
In the cloud, you can simply configure which machine to run on by passing
8+
a :class:`~lightning_app.utilities.packaging.cloud_compute.CloudCompute` to your work ``__init__`` method:
9+
10+
.. code-block:: python
11+
12+
import lightning as L
13+
14+
# Run on a small, shared CPU machine. This is the default for every LightningFlow.
15+
app = L.LightningApp(L.Flow(), flow_cloud_compute=L.CloudCompute())
16+
17+
18+
Here is the full list of supported machine names:
19+
20+
.. list-table:: Hardware by Accelerator Type
21+
:widths: 25 25 25
22+
:header-rows: 1
23+
24+
* - Name
25+
- # of CPUs
26+
- Memory
27+
* - flow-lite
28+
- 0.3
29+
- 4 GB
30+
31+
The up-to-date prices for these instances can be found `here <https://lightning.ai/pages/pricing>`_.
32+
33+
----
34+
35+
************
36+
CloudCompute
37+
************
38+
39+
.. autoclass:: lightning_app.utilities.packaging.cloud_compute.CloudCompute
40+
:noindex:

docs/source-app/core_api/lightning_app/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,14 @@ Peek under the hood
3939
:height: 180
4040
:tag: Intermediate
4141

42+
.. displayitem::
43+
:header: Customize Flow compute resources
44+
:description: Learn more about Flow customizations.
45+
:col_css: col-md-4
46+
:button_link: compute_content.html
47+
:height: 180
48+
:tag: Intermediate
49+
4250
.. displayitem::
4351
:header: Dynamically create, execute and stop Work
4452
:description: Learn more about components creation.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../../source-app/core_api/lightning_app/compute_content.rst

src/lightning_app/CHANGELOG.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,11 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1313
- Added a `--secret` option to CLI to allow binding secrets to app environment variables when running in the cloud ([#14612](https://github.com/Lightning-AI/lightning/pull/14612))
1414
- Added support for running the works without cloud compute in the default container ([#14819](https://github.com/Lightning-AI/lightning/pull/14819))
1515
- Added an HTTPQueue as an optional replacement for the default redis queue ([#14978](https://github.com/Lightning-AI/lightning/pull/14978)
16-
- Added authentication to HTTP queue ([#15202](https://github.com/Lightning-AI/lightning/pull/15202))
16+
- Added support for configuring flow cloud compute ([#14831](https://github.com/Lightning-AI/lightning/pull/14831))
1717
- Added support for adding descriptions to commands either through a docstring or the `DESCRIPTION` attribute ([#15193](https://github.com/Lightning-AI/lightning/pull/15193)
1818
- Added a try / catch mechanism around request processing to avoid killing the flow ([#15187](https://github.com/Lightning-AI/lightning/pull/15187)
19-
- Added a Database Component ([#14995](https://github.com/Lightning-AI/lightning/pull/14995)
19+
- Added an Database Component ([#14995](https://github.com/Lightning-AI/lightning/pull/14995)
20+
- Added authentication to HTTP queue ([#15202](https://github.com/Lightning-AI/lightning/pull/15202))
2021
- Added support to pass a `LightningWork` to the `LightningApp` ([#15215](https://github.com/Lightning-AI/lightning/pull/15215)
2122
- Added support getting CLI help for connected apps even if the app isn't running ([#15196](https://github.com/Lightning-AI/lightning/pull/15196)
2223
- Added support for adding requirements to commands and installing them when missing when running an app command ([#15198](https://github.com/Lightning-AI/lightning/pull/15198)

src/lightning_app/core/app.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from deepdiff import DeepDiff, Delta
1212
from lightning_utilities.core.apply_func import apply_to_collection
1313

14+
import lightning_app
1415
from lightning_app import _console
1516
from lightning_app.api.request_types import APIRequest, CommandRequest, DeltaRequest
1617
from lightning_app.core.constants import (
@@ -50,6 +51,7 @@ class LightningApp:
5051
def __init__(
5152
self,
5253
root: Union["LightningFlow", "LightningWork"],
54+
flow_cloud_compute: Optional["lightning_app.CloudCompute"] = None,
5355
debug: bool = False,
5456
info: frontend.AppInfo = None,
5557
root_path: str = "",
@@ -67,6 +69,7 @@ def __init__(
6769
Arguments:
6870
root: The root ``LightningFlow`` or ``LightningWork`` component, that defines all the app's nested
6971
components, running infinitely. It must define a `run()` method that the app can call.
72+
flow_cloud_compute: The default Cloud Compute used for flow, Rest API and frontend's.
7073
debug: Whether to activate the Lightning Logger debug mode.
7174
This can be helpful when reporting bugs on Lightning repo.
7275
info: Provide additional info about the app which will be used to update html title,
@@ -100,6 +103,7 @@ def __init__(
100103

101104
_validate_root_flow(root)
102105
self._root = root
106+
self.flow_cloud_compute = flow_cloud_compute or lightning_app.CloudCompute()
103107

104108
# queues definition.
105109
self.delta_queue: Optional[BaseQueue] = None

src/lightning_app/runners/cloud.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
V1QueueServerType,
3737
V1SourceType,
3838
V1UserRequestedComputeConfig,
39+
V1UserRequestedFlowComputeConfig,
3940
V1Work,
4041
)
4142
from lightning_cloud.openapi.rest import ApiException
@@ -206,6 +207,11 @@ def dispatch(
206207
flow_servers=frontend_specs,
207208
desired_state=V1LightningappInstanceState.RUNNING,
208209
env=v1_env_vars,
210+
user_requested_flow_compute_config=V1UserRequestedFlowComputeConfig(
211+
name=self.app.flow_cloud_compute.name,
212+
shm_size=self.app.flow_cloud_compute.shm_size,
213+
preemptible=False,
214+
),
209215
)
210216

211217
# if requirements file at the root of the repository is present,
@@ -242,6 +248,7 @@ def dispatch(
242248
works=[V1Work(name=work_req.name, spec=work_req.spec) for work_req in work_reqs],
243249
local_source=True,
244250
dependency_cache_key=app_spec.dependency_cache_key,
251+
user_requested_flow_compute_config=app_spec.user_requested_flow_compute_config,
245252
)
246253

247254
if ENABLE_MULTIPLE_WORKS_IN_DEFAULT_CONTAINER:

tests/tests_app/runners/test_cloud.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
V1QueueServerType,
3030
V1SourceType,
3131
V1UserRequestedComputeConfig,
32+
V1UserRequestedFlowComputeConfig,
3233
V1Work,
3334
)
3435

@@ -37,6 +38,7 @@
3738
from lightning_app.storage import Drive, Mount
3839
from lightning_app.utilities.cloud import _get_project
3940
from lightning_app.utilities.dependency_caching import get_hash
41+
from lightning_app.utilities.packaging.cloud_compute import CloudCompute
4042

4143

4244
class MyWork(LightningWork):
@@ -66,6 +68,47 @@ def run(self):
6668
class TestAppCreationClient:
6769
"""Testing the calls made using GridRestClient to create the app."""
6870

71+
@mock.patch("lightning_app.runners.backends.cloud.LightningClient", mock.MagicMock())
72+
def test_run_with_custom_flow_compute_config(self, monkeypatch):
73+
mock_client = mock.MagicMock()
74+
mock_client.projects_service_list_memberships.return_value = V1ListMembershipsResponse(
75+
memberships=[V1Membership(name="test-project", project_id="test-project-id")]
76+
)
77+
mock_client.lightningapp_instance_service_list_lightningapp_instances.return_value = (
78+
V1ListLightningappInstancesResponse(lightningapps=[])
79+
)
80+
cloud_backend = mock.MagicMock()
81+
cloud_backend.client = mock_client
82+
monkeypatch.setattr(backends, "CloudBackend", mock.MagicMock(return_value=cloud_backend))
83+
monkeypatch.setattr(cloud, "LocalSourceCodeDir", mock.MagicMock())
84+
app = mock.MagicMock()
85+
app.flows = []
86+
app.frontend = {}
87+
app.flow_cloud_compute = CloudCompute(name="t2.medium")
88+
cloud_runtime = cloud.CloudRuntime(app=app, entrypoint_file="entrypoint.py")
89+
cloud_runtime._check_uploaded_folder = mock.MagicMock()
90+
91+
monkeypatch.setattr(Path, "is_file", lambda *args, **kwargs: False)
92+
monkeypatch.setattr(cloud, "Path", Path)
93+
cloud_runtime.dispatch()
94+
body = Body8(
95+
app_entrypoint_file=mock.ANY,
96+
enable_app_server=True,
97+
flow_servers=[],
98+
image_spec=None,
99+
works=[],
100+
local_source=True,
101+
dependency_cache_key=mock.ANY,
102+
user_requested_flow_compute_config=V1UserRequestedFlowComputeConfig(
103+
name="t2.medium",
104+
preemptible=False,
105+
shm_size=0,
106+
),
107+
)
108+
cloud_runtime.backend.client.lightningapp_v2_service_create_lightningapp_release.assert_called_once_with(
109+
project_id="test-project-id", app_id=mock.ANY, body=body
110+
)
111+
69112
@mock.patch("lightning_app.runners.backends.cloud.LightningClient", mock.MagicMock())
70113
def test_run_on_byoc_cluster(self, monkeypatch):
71114
mock_client = mock.MagicMock()
@@ -100,6 +143,7 @@ def test_run_on_byoc_cluster(self, monkeypatch):
100143
works=[],
101144
local_source=True,
102145
dependency_cache_key=mock.ANY,
146+
user_requested_flow_compute_config=mock.ANY,
103147
)
104148
cloud_runtime.backend.client.lightningapp_v2_service_create_lightningapp_release.assert_called_once_with(
105149
project_id="default-project-id", app_id=mock.ANY, body=body
@@ -142,6 +186,7 @@ def test_requirements_file(self, monkeypatch):
142186
works=[],
143187
local_source=True,
144188
dependency_cache_key=mock.ANY,
189+
user_requested_flow_compute_config=mock.ANY,
145190
)
146191
cloud_runtime.backend.client.lightningapp_v2_service_create_lightningapp_release.assert_called_once_with(
147192
project_id="test-project-id", app_id=mock.ANY, body=body
@@ -264,6 +309,7 @@ def test_call_with_work_app(self, lightningapps, monkeypatch, tmpdir):
264309
enable_app_server=True,
265310
flow_servers=[],
266311
dependency_cache_key=get_hash(requirements_file),
312+
user_requested_flow_compute_config=mock.ANY,
267313
image_spec=Gridv1ImageSpec(
268314
dependency_file_info=V1DependencyFileInfo(
269315
package_manager=V1PackageManager.PIP, path="requirements.txt"
@@ -431,6 +477,7 @@ def test_call_with_work_app_and_attached_drives(self, lightningapps, monkeypatch
431477
enable_app_server=True,
432478
flow_servers=[],
433479
dependency_cache_key=get_hash(requirements_file),
480+
user_requested_flow_compute_config=mock.ANY,
434481
image_spec=Gridv1ImageSpec(
435482
dependency_file_info=V1DependencyFileInfo(
436483
package_manager=V1PackageManager.PIP, path="requirements.txt"
@@ -590,6 +637,7 @@ def test_call_with_work_app_and_multiple_attached_drives(self, lightningapps, mo
590637
enable_app_server=True,
591638
flow_servers=[],
592639
dependency_cache_key=get_hash(requirements_file),
640+
user_requested_flow_compute_config=mock.ANY,
593641
image_spec=Gridv1ImageSpec(
594642
dependency_file_info=V1DependencyFileInfo(
595643
package_manager=V1PackageManager.PIP, path="requirements.txt"
@@ -623,6 +671,7 @@ def test_call_with_work_app_and_multiple_attached_drives(self, lightningapps, mo
623671
enable_app_server=True,
624672
flow_servers=[],
625673
dependency_cache_key=get_hash(requirements_file),
674+
user_requested_flow_compute_config=mock.ANY,
626675
image_spec=Gridv1ImageSpec(
627676
dependency_file_info=V1DependencyFileInfo(
628677
package_manager=V1PackageManager.PIP, path="requirements.txt"
@@ -756,6 +805,7 @@ def test_call_with_work_app_and_attached_mount_and_drive(self, lightningapps, mo
756805
package_manager=V1PackageManager.PIP, path="requirements.txt"
757806
)
758807
),
808+
user_requested_flow_compute_config=mock.ANY,
759809
works=[
760810
V1Work(
761811
name="test-work",

0 commit comments

Comments
 (0)