Skip to content

[BUG] Multi GPU validate or inference, REPORT cuda out of memory #2513

@AiteYx

Description

@AiteYx

Describe the bug
when run validate.py or inference.py, Error with "cuda out of memory". I have resolved the error.

I don't know if there's a problem with my solution

To Reproduce
Steps to reproduce the behavior:

  1. set "device=cuda:4 && no-prefetcher=False"
  2. cuda:0 must have no space left
  3. run validate.py or inference.py

Additional context
CAUSE
this line https://github.com/huggingface/pytorch-image-models/blob/main/timm/data/loader.py#L126,
function "torch.cuda.Stream()". by default, it runs on GPU 0, but my GPU 0 memory is full, so it reports an error.

SOLUTION
so, use this code to wrap around, "with torch.cuda.device(self.device):".

OTHER
The same problem: https://github.com/huggingface/pytorch-image-models/blob/main/timm/data/loader.py#L151

I don't know if there's a problem with my solution, Thank you for your answer.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions