服务器更新后,输入nvidia-smi出现如下报错:

解决方法参考:

已解决【nvidia-smi】Failed to initialize NVML: Driver/library version mismatch解决方法-腾讯云开发者社区-腾讯云 (tencent.com)

输入命令查看nvidia驱动的版本号:

dpkg -l | grep nvidia

再输入命令查看内核的版本:

cat /proc/driver/nvidia/version 

可以看到目前系统安装的 NVIDIA 驱动包版本是 470.256.02,但是内核模块显示的版本是 535.183.01。这意味着系统中安装的驱动包与正在使用的内核模块版本不匹配,导致了 GPU 驱动问题。 

更新驱动后,仍有部分包安装失败:

未完全安装的包 (iU 状态):

这些包是 NVIDIA 驱动程序的重要组件,必须完全安装才能正常工作。

  • nvidia-dkms-535
  • nvidia-driver-535
  • nvidia-kernel-common-535
  • xserver-xorg-video-nvidia-535

修复未安装的包:

sudo apt --fix-broken install

报错了一堆依赖问题:

You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 nvidia-dkms-535 : Depends: nvidia-firmware-535-535.183.01 but it is not going to be installed
 nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
                     Depends: libnvidia-extra-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
                     Depends: nvidia-compute-utils-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
                     Depends: libnvidia-decode-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
                     Depends: libnvidia-encode-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
                     Depends: nvidia-utils-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
                     Depends: libnvidia-cfg1-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
                     Recommends: libnvidia-compute-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
                     Recommends: libnvidia-decode-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
                     Recommends: libnvidia-encode-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
                     Recommends: libnvidia-fbc1-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
                     Recommends: libnvidia-gl-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
 nvidia-kernel-common-535 : Depends: nvidia-firmware-535-535.183.01 but it is not going to be installed
 xserver-xorg-video-nvidia-535 : Depends: libnvidia-cfg1-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

然后尝试删掉有问题的包,还是同样的报错。陷入了循坏,修复包需要满足依赖项,删除包同样需要满足依赖项,做什么操作都会报错,依赖关系混乱,只能重装驱动:

尝试使用 dpkg 强制移除 NVIDIA 驱动,忽略依赖关系:

sudo dpkg -r --force-depends nvidia-driver-535

之后,清理系统并重新安装驱动:

sudo apt-get autoremove 
sudo apt-get clean 
sudo apt-get update

然后重新安装:

sudo apt-get install nvidia-driver-535

 (这些步骤比较直接,需要谨慎操作,避免系统损坏!!)

Logo

一站式 AI 云服务平台

更多推荐