NVIDIA Container Runtime official website
GitHub repository : Docker is the most widely used container technology by developers. With the NVIDIA Container Runtime, developers can expose NVIDIA GPUs to applications in the container simply by registering a new runtime during container creation. NVIDIA Container Runtime for Docker is an open source project hosted on GitHub.
Article directory
Introduction
NVIDIA Container Runtime is a GPU aware container runtime, compatible with the Open Containers Initiative (OCI) specification used by Docker, CRI-O, and other popular container technologies. It simplifies the process of building and deploying containerized GPU-accelerated applications to desktop, cloud or data centers.
NVIDIA Container Runtime is a GPU-aware container runtime that is compatible with the Open Containers Initiative (OCI) specification used by Docker, CRI-O, and other popular container technologies. It simplifies the process of building containerized GPU-accelerated applications and deploying them to the desktop, cloud, or data center.
With NVIDIA Container Runtime supported container technologies like Docker, developers can wrap their GPU-accelerated applications along with its dependencies into a single package that is guaranteed to deliver the best performance on NVIDIA GPUs, regardless of the deployment environment
. Container technologies such as Docker allow developers to package their GPU-accelerated applications and their dependencies into a single package that is guaranteed to deliver optimal performance on NVIDIA GPUs regardless of deployment environment.
Install
This article refers to the official installation documentation of NVIDIA Container Toolkit to install it in Ubuntu 22.04.
Environmental requirements
- NVIDIA Linux driver is installed and version >= 418.81.07
- Kernel version > 3.10 GNU/Linux x86_64
- Docker >= 19.03
- NVIDIA GPU with architecture >= Kepler (or Compute Capability 3.0)
start installation
- Set up package repository and GPG key
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- Update and download and install nvidia-docker2
sudo apt-get update
update may report an error:
sudo apt-get update
E: Conflicting values set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ /: /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg !=
E: The list of sources could not be read.
For solutions, see the official document Conflicting values set for option Signed-By error when running apt update
sudo apt-get install -y nvidia-docker2
sudo nvidia-ctk runtime configure --runtime=docker
- Restart the Docker daemon and test
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
If you see output similar to the following, the installation is successful.
Usage example
Refer to the official document User Guide
Add NVIDIA Runtime
Because nvidia-docker2 has been installed above, there is no need to add NVIDIA Runtime.
Set environment variables
Users can use environment variables to control the behavior of the NVIDIA container runtime - specifically the ability to enumerate GPUs and drivers.
These environment variables have already been set in the basic CUDA image provided by NVIDIA.
GPU enumeration
Use --gpus or use the environment variable NVIDIA_VISIBLE_DEVICES to control which GPUs the container can use
The value of NVIDIA_VISIBLE_DEVICES is as follows
Possible values |
Description |
---|---|
|
a comma-separated list of GPU UUID(s) or index(es). |
|
all GPUs will be accessible, this is the default value in base CUDA container images. |
|
no GPU will be accessible, but driver capabilities will be enabled. (All GPUs cannot be used, but driver capabilities are enabled.) |
|
|
When using --gpu to specify the GPU, the device parameter should also be used. The example is as follows
docker run --gpus '"device=1,2"' \
nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv
Enable all GPUs
docker run --rm --gpus all nvidia/cuda nvidia-smi
Use NVIDIA_VISIBLE_DEVICES to enable all GPUs
docker run --rm --runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda nvidia-smi
Use NVIDIA_VISIBLE_DEVICES to enable specified GPUs
docker run --rm --runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES=1,2 \
nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv
Start a GPU enabled container on two GPUs
docker run --rm --gpus 2 nvidia/cuda nvidia-smi
Use nvidia-smi to query the GPU UUID and assign it to the container
nvidia-smi -i 3 --query-gpu=uuid --format=csv
uuid
GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24
docker run --gpus device=GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24 \
nvidia/cuda nvidia-smi
Drive function
NVIDIA_DRIVER_CAPABILITIES controls which driver libraries/binaries are mounted into the container.
The values of NVIDIA_DRIVER_CAPABILITIES are as follows
Possible values |
Description |
---|---|
|
a comma-separated list of driver features the container needs. |
|
enable all available driver capabilities. |
empty or unset |
use default driver capability: |
Driver Capability |
Description |
---|---|
|
required for CUDA and OpenCL applications. |
|
required for running 32-bit applications. |
|
required for running OpenGL and Vulkan applications. |
|
required for using |
|
required for using the Video Codec SDK. |
|
required for leveraging X11 display. |
For example, specify compute and utility, two ways of writing
docker run --rm --runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES=2,3 \
-e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
nvidia/cuda nvidia-smi
docker run --rm --gpus 'all,"capabilities=compute,utility"' \
nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
constraint
The NVIDIA runtime also provides containers with the ability to define constraints in configuration files
NVIDIA_REQUIRE_* is a logical expression used to define constraints on the software version or GPU architecture on the container. The following is the specific content of the constraints.
Constraint |
Description |
---|---|
|
constraint on the CUDA driver version. |
|
constraint on the driver version. |
|
constraint on the compute architectures of the selected GPUs. |
|
constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID). |
NVIDIA_REQUIRE_CUDA "cuda>=11.0 driver>=450"
See the original text for more information
Dockerfile
This can be set via environment variables, for example
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
Docker Compose
Refer to the tutorial in Docker official documentation
Compose v2.3 writing method
services:
test:
image: nvidia/cuda:10.2-base
command: nvidia-smi
runtime: nvidia
This way of writing cannot control the specific properties of the GPU.
More granular control
- capabilities
values are specified as a list of strings (eg. capabilities: [gpu]). You must set this field in the Compose file. Otherwise, it will return an error when the service is deployed. - count
is specified as a value of int or all indicating the number of GPU devices that should be reserved (assuming the host has that number of GPUs). - device_ids
is specified as the value of a list of strings representing the GPU device IDs from the host. The device ID can be found in the nvidia-smi output on the host machine. - driver
specified as a string value (e.g. driver: 'nvidia') - options
represents a key-value pair of driver-specific options.
count and device_ids are mutually exclusive. You can only define one field at a time.
For more information about these properties, see the Compose Specification section in deploy.
For example, use all GPUs on the host and specified driver functions (although the value of NVIDIA_DRIVER_CAPABILITIES can be all, you cannot write all here, an error will be reported, you can only write each one clearly)
services:
test:
image: nvidia/cuda:10.2-base
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [compute,graphics,video,utility,display]
For more setting examples, see the official documentation