pytorch训练时,遇到错误中断

    torch.cuda.empty_cache()
  File "/home/qiang/anaconda3/envs/pointsr/lib/python3.7/site-packages/torch/cuda/__init__.py", line 426, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: device-side assert triggered

按照后面的提示增加环境变量 CUDA_LAUNCH_BLOCKING=1

CUDA_LAUNCH_BLOCKING=1 python train.py

再执行是可以看到,具体出错原因是out of  memory

  File "/media/private/dou/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 850, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: out of memory

Logo

一站式 AI 云服务平台

更多推荐