-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[App] Expose Run Work Executor #15561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 69 commits
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
3e187d1
update
tchaton 092b36a
update
tchaton fcb2ea2
update
tchaton 6481338
Merge branch 'master' into add_multi_node_examples
tchaton e1271ce
update
tchaton 478d0f0
Merge branch 'add_multi_node_examples' of https://github.com/Lightnin…
tchaton 0c5a079
update
tchaton 804c5cb
update
tchaton baf1cae
update
tchaton a393f58
update
tchaton 38f1c72
update
tchaton 4ddb3ae
update
tchaton ed93320
update
tchaton 402b6fd
update
tchaton dece823
update
tchaton 4b7e8af
update
tchaton db336d3
update
tchaton 2cd0d54
update
tchaton 7da57cd
update
tchaton 651590e
update
tchaton 17ac6db
update
tchaton 589ff92
update
tchaton d221d35
update
tchaton 53597c7
Merge branch 'master' into add_multi_node_examples
tchaton fa6def5
update
tchaton 0adcdf3
Merge branch 'add_multi_node_examples' of https://github.com/Lightnin…
tchaton f2fa720
update
tchaton 7778c58
update
tchaton c005373
update
tchaton f8fda2e
update
tchaton 00119f6
update
tchaton 0f4e5e5
update
tchaton 9e437df
update
tchaton dec2def
update
tchaton 2c4b26c
update
tchaton b596dbe
update
tchaton 45fba58
update
tchaton 8484601
update
tchaton 45dde23
update
tchaton 09b1b5d
update
tchaton 0732886
update
tchaton 6259800
update
tchaton f48004d
update
tchaton 56b4bc9
update
tchaton e45ea15
update
tchaton 089b677
update
tchaton 5b14153
update
tchaton 6433868
update
tchaton 7ed3313
update
tchaton 3e78c0c
update
tchaton 7c8e82f
update
tchaton af7eb60
update
tchaton b89c47d
update
tchaton bef788a
update
tchaton 61b23d9
update
tchaton 381e013
update
tchaton 44a1cc2
Merge branch 'master' into expose_work_runner
tchaton ea01249
update
tchaton e39d1da
Merge branch 'expose_work_runner' of https://github.com/Lightning-AI/…
tchaton e079633
update
tchaton 3ea7b54
update
tchaton 7fde3c5
update
tchaton a60a480
update
tchaton f0a402e
update
tchaton 9921cf1
update
tchaton 6067c9a
update
tchaton 34d618e
update
tchaton 2af5549
Merge branch 'master' into expose_work_runner
tchaton 2227f35
Merge branch 'master' into expose_work_runner
tchaton a909169
Apply suggestions from code review
Borda 7975fa8
update
tchaton 512bd73
update
tchaton 9ac00b1
Merge branch 'master' into expose_work_runner
tchaton 86e27c9
update
tchaton 6c03a31
update
tchaton ffea172
update
tchaton e5a4880
add note
tchaton 8c59907
Update examples/app_multi_node/README.md
tchaton 74dfa0f
Merge branch 'master' into expose_work_runner
tchaton b157a21
update
tchaton e1bce24
Merge branch 'expose_work_runner' of https://github.com/Lightning-AI/…
tchaton 41b6b81
Merge branch 'master' into expose_work_runner
tchaton File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import torch | ||
|
||
import lightning as L | ||
from lightning.app.components import LiteMultiNode | ||
from lightning.lite import LightningLite | ||
|
||
|
||
class LitePyTorchDistributed(L.LightningWork): | ||
@staticmethod | ||
def run(): | ||
# 1. Create LightningLite. | ||
lite = LightningLite(strategy="ddp", precision="bf16") | ||
|
||
# 2. Prepare distributed model and optimizer. | ||
model = torch.nn.Linear(32, 2) | ||
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) | ||
model, optimizer = lite.setup(model, optimizer) | ||
criterion = torch.nn.MSELoss() | ||
|
||
# 3. Train the model for 50 steps. | ||
for step in range(50): | ||
model.zero_grad() | ||
x = torch.randn(64, 32).to(lite.device) | ||
output = model(x) | ||
loss = criterion(output, torch.ones_like(output)) | ||
print(f"global_rank: {lite.global_rank} step: {step} loss: {loss}") | ||
lite.backward(loss) | ||
optimizer.step() | ||
|
||
|
||
app = L.LightningApp( | ||
tchaton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
LiteMultiNode( | ||
LitePyTorchDistributed, | ||
cloud_compute=L.CloudCompute("gpu-fast-multi"), # 4 x V100, | ||
num_nodes=2, | ||
) | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
import lightning as L | ||
from lightning.app.components import PyTorchLightningMultiNode | ||
from lightning.pytorch.demos.boring_classes import BoringModel | ||
|
||
|
||
class PyTorchLightningDistributed(L.LightningWork): | ||
@staticmethod | ||
def run(): | ||
model = BoringModel() | ||
trainer = L.Trainer( | ||
max_epochs=10, | ||
strategy="ddp", | ||
) | ||
trainer.fit(model) | ||
|
||
|
||
compute = L.CloudCompute("gpu-fast-multi") # 4 x V100 | ||
app = L.LightningApp( | ||
PyTorchLightningMultiNode( | ||
PyTorchLightningDistributed, | ||
num_nodes=2, | ||
cloud_compute=compute, | ||
) | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
import torch | ||
from torch.nn.parallel.distributed import DistributedDataParallel | ||
|
||
import lightning as L | ||
from lightning.app.components import PyTorchSpawnMultiNode | ||
|
||
|
||
class PyTorchDistributed(L.LightningWork): | ||
|
||
# Note: Only staticmethod are support for now with `PyTorchSpawnMultiNode` | ||
@staticmethod | ||
def run( | ||
world_size: int, | ||
node_rank: int, | ||
global_rank: str, | ||
local_rank: int, | ||
): | ||
# 1. Prepare distributed model | ||
model = torch.nn.Linear(32, 2) | ||
device = torch.device(f"cuda:{local_rank}") if torch.cuda.is_available() else torch.device("cpu") | ||
device_ids = device if torch.cuda.is_available() else None | ||
model = DistributedDataParallel(model, device_ids=device_ids).to(device) | ||
|
||
# 2. Prepare loss and optimizer | ||
criterion = torch.nn.MSELoss() | ||
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) | ||
|
||
# 3. Train the model for 50 steps. | ||
for step in range(50): | ||
model.zero_grad() | ||
x = torch.randn(64, 32).to(device) | ||
output = model(x) | ||
loss = criterion(output, torch.ones_like(output)) | ||
print(f"global_rank: {global_rank} step: {step} loss: {loss}") | ||
loss.backward() | ||
optimizer.step() | ||
|
||
|
||
compute = L.CloudCompute("gpu-fast-multi") # 4 x V100 | ||
app = L.LightningApp( | ||
PyTorchSpawnMultiNode( | ||
PyTorchDistributed, | ||
num_nodes=2, | ||
cloud_compute=compute, | ||
) | ||
) |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
pytorch-lightning>=1.8.0 | ||
lightning_lite | ||
tchaton marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
from lightning_app.components.multi_node.base import MultiNode | ||
from lightning_app.components.multi_node.lite import LiteMultiNode | ||
from lightning_app.components.multi_node.pl import PyTorchLightningMultiNode | ||
from lightning_app.components.multi_node.pytorch_spawn import PyTorchSpawnMultiNode | ||
|
||
__all__ = ["LiteMultiNode", "MultiNode", "PyTorchSpawnMultiNode", "PyTorchLightningMultiNode"] | ||
tchaton marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.