Skip to content

[BUG]: RuntimeError of "RANK" when running train.py of ResNet example on a single GPU #1074

@songyuc

Description

@songyuc

🐛 Describe the bug

I met a problem today when running with python train.py, as below,

/home/user/software/python/anaconda/anaconda3/envs/conda-general/bin/python /home/user/***/***
/ColossalAI-Examples/image/resnet/train.py
Traceback (most recent call last):
  File "/home/user/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/colossalai/initialize.py", line 210, in launch_from_torch
    rank = int(os.environ['RANK'])
  File "/home/user/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'RANK'

During handling of the above exception, another exception occurred:

...

RuntimeError: Could not find 'RANK' in the torch environment, visit https://www.colossalai.org/ for more information on launching with torch

Is this error due to the absence of environment variable RANK in my Ubuntu?

Environment

Python: 3.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions