pytorch 使用指定的GPU RuntimeError: CUDA error: invalid device ordinal

pytorch使用指定GPU报错:

Traceback (most recent call last):
  File "test_bed/process_deepglint.py", line 102, in <module>
    pred_dataset(outputFile)
  File "test_bed/process_deepglint.py", line 36, in pred_dataset
    pred_loader_deepg, model, criterion, attrWeights, useArcface = main()
  File "/home/user1/main_cs_0708.py", line 114, in main
    model = models.__dict__[arch]()
  File "/home/user1/models/arc_face.py", line 35, in arcface
    learner = arc_face.face_learner(conf, inference=True)
  File "/home/user1/arc_face/Learner.py", line 24, in __init__
    self.model = Backbone(conf.net_depth, conf.drop_ratio, conf.net_mode).to(conf.device)
  File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 386, in to
    return self._apply(convert)
  File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 199, in _apply
    param.data = fn(param.data)
  File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 384, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal

原因可能是:

  1. 代码中多个位置设置了使用哪些GPU,相互冲突,包括但不限于以下形式:os.environ, torch.device, torch.cuda.set_device, args.gpu_id等等,具体代码具体分析。不同代码作用范围不同,可能你后来设置的没有起到作用,起作用的是之前设置的。
  2. os.environ和 torch.device没有配合好。详见:matt-gardner@https://github.com/allenai/allennlp/issues/1090
  3. torch.device API 官方:https://pytorch.org/docs/stable/tensor_attributes.html

在我的代码中最后设置的就是:

os.environ['CUDA_VISIBLE_DEVICES'] = '1,'
conf.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

这样设置的就是使用第二个GPU,序号为1

猜你喜欢

转载自blog.csdn.net/qxqxqzzz/article/details/107720675