这个问题是由于小虎在用DP的时候出现的,解决方法是在特别的变量后面加上.contiguous()

环境

python 3.10 + Pytorch2.0

问题原文

/home/xiaohu/anaconda3/envs/smor/lib/python3.10/site-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [3, 24, 1, 1], strides() = [24, 1, 24, 24]
bucket_view.sizes() = [3, 24, 1, 1], strides() = [24, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/home/xiaohu/anaconda3/envs/smor/lib/python3.10/site-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [3, 24, 1, 1], strides() = [24, 1, 24, 24]
bucket_view.sizes() = [3, 24, 1, 1], strides() = [24, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)

解决方法

在rearrange, transpose, repeat, 和einsum后面加上.contiguous()。Add .contiguous() after rearrange, transpose, repeat, and einsum.
比如:

x = rearrange(x, 'b h w (p1 p2 c)-> b (h p1) (w p2) c', p1=self.dim_scale, p2=self.dim_scale, c=C//self.dim_scale).contiguous()

x_hwwh = torch.stack([x.view(B, -1, L), torch.transpose(x, dim0=2, dim1=3).contiguous().view(B, -1, L)], dim=1).view(B, 2, -1, L)

x_dbl = torch.einsum("b k d l, k c d -> b k c l", xs.view(B, K, -1, L), self.x_proj_weight).contiguous()

参考资料

Grad strides do not match bucket view strides

Logo

一站式 AI 云服务平台

更多推荐