这个问题是因为深度学习的程序(服务)跟本地主机连接不上,解决方法是确认rank起始数为0。

报错原文

[W socket.cpp:663] [c10d] The client socket has failed to connect to [csdn-xiaohu]:12345 (errno: 22 - Invalid argument).

解决方法

Rank应该从0开始,Rank should start from 0。

opt.rank = kwargs.get("start_rank", 0) + opt.gpu_id

To

opt.rank = kwargs.get("start_rank", 0) + i

原版笔记

If the socket is not valid.
The call is being blocked and cannot get to the client who opened it.
The client has closed/is closing the socket at the time of the call.

Logo

一站式 AI 云服务平台

更多推荐