Skip to content

checkpoint migration #9396

@awaelchli

Description

@awaelchli

🚀 Feature

Add upgrade functions to utilities for internal use. Checkpoints get upgraded automatically (when possible) when a user loads the checkpoint using Trainer(resume_from_checkpoint) or Model.load_from_checkpoint.

Motivation

Lightning changes over time with removals and additions, this includes checkpoint contents and structure. When changes happen, we bake the upgrade logic into the code base at the appropriate place, but the danger is that the information why and when these changes were made gets lost over time.

Pitch

For each BC change we create an upgrade function that gets applied at the appropriate place.

def upgrade_xyz_v1_2_0(checkpoint):
    # upgrades the checkpoint from a previous version to 1.2.0
    return checkpoint

def upgrade_abc_v1_3_8(checkpoint):
    # upgrades the checkpoint from previous version to 1.3.8
    return checkpoint


def upgrade(checkpoint)
    checkpoint = upgrade_xyz_v1_2_0(checkpoint)
    checkpoint = upgrade_abc_v1_3_8(checkpoint)
    return

# in Lightning:
ckpt = upgrade(pl_load(path))

Benefits with this approach are:

  • each upgrade is documented individually
  • central location for all upgrades, the order in which they are applied is fully transparent
  • can unit test each upgrade individually!

Alternatives

keep as is

Additional context

PRs that started this work:

PRs that added checkpoint back-compatibility logic that can be avoided by this proposal:


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @awaelchli @ananthsub @ninginthecloud @rohitgr7 @otaj

Metadata

Metadata

Labels

checkpointingRelated to checkpointingfeatureIs an improvement or enhancementhelp wantedOpen to be worked onlet's do it!approved to implementplGeneric label for PyTorch Lightning package

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions