Using containers for GPU workloads requires installing the Nvidia container toolkit and running Docker with additional flags. This tutorial explains how to set up the Nvidia container toolkit, run Docker for GPU workloads, and install Miniconda to manage Python environments. This guide focuses on PyTorch usage with GPU Droplets on DigitalOcean.
To follow this tutorial, you will need:
DigitalOcean’s GPU Droplets are NVIDIA H100s that you can spin up on-demand—try them out by spinning up a GPU Droplet today.
Create a GPU Droplet-Log into your DigitalOcean account, create a new GPU Droplet with the OS Image set as “AI/ML Ready v1.0”, and choose a GPU plan.
Once the GPU Droplet is created, log into its console.
Add a New User (Recommended)-Instead of using the root user for everything, it’s better to create a new user for security reasons:
Using containers for GPU workloads requires installing the Nvidia container toolkit and running docker
with additional flags.
The Nvidia container toolkit replaced the previous wrapper named nvidia-docker
. You can install the toolkit and Docker with the following command:
Run the following command to enable the Nvidia container runtime:
After enabling the runtime, restart Docker to apply the changes:
When running PyTorch in a container, Nvidia recommends using specific Docker flags for sufficient memory allocation.
These flags are responsible for:
–gpus all
: Enables GPU access for the container.
–ipc=host
: Allows the container to use the host’s IPC namespace.
–ulimit memlock=-1
: Removes the limit on locked-in-memory address space.
–ulimit stack=67108864
: Sets the maximum stack size to 64MB.
To confirm that PyTorch is working correctly in a containerized environment, run the following command:
The above docker invocation will confirm pytorch
is working correctly in a containerized environment. The final print from the execution should show “CUDA available: True”.
Use the same base arguments for multi-node configurations as for the single-node setup, but include additional bind mounts to discover the GPU fabric network devices and NCCL topology.
These flags are responsible for:
– gpus all
: Enables access to all available GPUs in the container.
– ipc=host
: Uses the host’s IPC namespace, allowing better inter-process communication.
– ulimit memlock=-1
: Removes the limit on locked-in-memory address space.
– ulimit stack=67108864
: Sets the maximum stack size to 64MB.
– network=host
: Uses the host’s network stack inside the container.
– volume /dev/infiniband:/dev/infiniband
: Mounts the InfiniBand devices into the container.
– volume /sys/class/infiniband/:/sys/class/infiniband/
: Mounts InfiniBand system information.
– device /dev/infiniband/:/dev/infiniband/
: Allows the container to access InfiniBand devices.
– -v /etc/nccl.conf:/etc/nccl.conf
: Mounts the NCCL (NVIDIA Collective Communications Library) configuration file.
– -v /etc/nccl:/etc/nccl
: Mounts the NCCL directory for additional configurations.
To confirm that PyTorch is functioning in a containerized multi-node environment, execute the following command:
The above docker invocation will confirm that PyTorch is working correctly in a containerized multi-node environment. The final print from the execution should show “CUDA available: True”.
Miniconda is a lightweight version of Anaconda, providing an efficient way to manage Python environments. To install Miniconda, follow these steps:
Use the following commands to download and install Miniconda.
Exit and log back in to apply the changes.
Now log back in as the do-shark
user.
Verify the conda
version.
With Miniconda installed, you can set up a Python environment for PyTorch:
To install PyTorch with CUDA support, use the following command. CUDA, which stands for Compute Unified Device Architecture, is a parallel computing platform and programming model for general computing on graphical processing units (GPUs).
You have successfully set up Nvidia Container Toolkit and Miniconda on your DigitalOcean GPU Droplet. You are now ready to use containerized PyTorch workloads with GPU support. For further information, you can explore the official documentation for Nvidia’s Deep Learning Containers and PyTorch.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!