安装NV驱动
网址:https://www.nvidia.com/DOWNLOAD/Find.aspx?lang=en-us
-
wget 驱动
-
然后 chmod +x <下载好的.run文件>下载好的.run文件>
-
./<下载好的.run文件>下载好的.run文件>
-
如果出现了”Unable to find the kernel source tree for the currently running kernel.“报错则运行
sudo yum install kernel-devel
-
执行该命令打开persistant mode(每次服务器重启都要运行该命令)
nvidia-persistenced --persistence-mode
可能遇到的问题
close X Server
sudo service lightdm stop
sudo service gdm stop
sudo service kdm stop
How to disable Nouveau kernel driver
sudo vim /etc/modprobe.d/blacklist-nouveau.conf
输入内容:
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u
sudo reboot
gcc
sudo apt install build-essential
sudo apt-get install manpages-dev
gcc --version
安装docker
Centos安装docker
-
依次执行命令(https://docs.docker.com/engine/install/centos/)
sudo yum install -y yum-utils sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
-
docker移动安装目录到挂载的大硬盘下(必须要做)
systemctl stop docker # 停掉docker服务 mv /var/lib/docker /data/docker # 转移docker安装目录 ln -s /data/docker /var/lib/docker # 软链
ubuntu 安装docker
https://docs.docker.com/engine/install/ubuntu/
- Set up the repository
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
- Install Docker Engine
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo docker run hello-world
异常处理
root@midu-System-Product-Name:/home/midu/Downloads# apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
正在读取软件包列表... 完成
正在分析软件包的依赖关系树... 完成
正在读取状态信息... 完成
没有可用的软件包 docker-ce,但是它被其它的软件包引用了。
这可能意味着这个缺失的软件包可能已被废弃,
或者只能在其他发布源中找到
E: 软件包 docker-ce 没有可安装候选
E: 无法定位软件包 docker-ce-cli
E: 无法定位软件包 containerd.io
E: 无法按照 glob ‘containerd.io’ 找到任何软件包
E: 无法按照正则表达式 containerd.io 找到任何软件包
E: 无法定位软件包 docker-buildx-plugin
E: 无法定位软件包 docker-compose-plugin
需要更新apt-get:
sudo apt-get update
安装nvidia-docker2
Centos安装nvidia-docker2
-
依次执行下面的命令
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo sudo yum clean expire-cache sudo yum install -y nvidia-docker2
-
更新/etc/docker/daemon.json⽂件
{ "registry-mirrors": [ "https://registry.docker-cn.com", "https://docker.mirrors.ustc.edu.cn", "http://hub-mirror.c.163.com" ], "insecure-registries": [ "harbor.miduchina.com", "harbor-nh.miduchina.com" ], "bip":"192.51.0.1/16", "log-driver":"json-file", "log-opts": {"max-size":"500m", "max-file":"2"}, "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }
-
启动docker服务
systemctl start docker
ubuntu安装nvidia-docker2
- 安装
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
sudo apt-get remove nvidia -384 ; sudo apt-get install nvidia-384
-
更新/etc/docker/daemon.json⽂件
{ "registry-mirrors": [ "https://registry.docker-cn.com", "https://docker.mirrors.ustc.edu.cn", "http://hub-mirror.c.163.com" ], "insecure-registries": [ "harbor.miduchina.com", "harbor-nh.miduchina.com" ], "bip":"192.51.0.1/16", "log-driver":"json-file", "log-opts": {"max-size":"500m", "max-file":"2"}, "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }
-
启动docker服务
systemctl start docker
测试
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Til next time,
gqjia
at 00:00