YOLOv8, developed by Ultralytics in 2023, has emerged as one of the unique object detection algorithms in the YOLO series and comes with significant architectural and performance enhancements over its predecessors, like YOLOv5. These improvements include a CSPNet backbone for better feature extraction, an FPN+PAN neck for improved multi-scale object detection, and a shift to an anchor-free approach. These changes significantly improve the model’s accuracy, efficiency, and usability for real-time object detection.
Using a GPU with YOLOv8 can significantly boost performance for object detection tasks, providing faster training and inference. This guide will walk you through setting up YOLOv8 for GPU usage, including configuration, troubleshooting, and optimization tips.
YOLOv8 builds upon its predecessors with advanced neural network design and training techniques to enhance performance in object detection. It unifies object localization and classification in a single, efficient framework, balancing speed and accuracy. The architecture comprises three key components:
These innovations make YOLOv8 faster, more accurate, and versatile for modern object detection tasks. Furthermore, YOLOv8 introduces an anchor-free approach to bounding box prediction, moving away from the anchor-based methods of earlier versions.
YOLOv8 (You Only Look Once, Version 8) is a powerful object detection framework. While it runs on CPUs, utilizing a GPU offers a few key benefits, such as:
GPUs are the clear choice for achieving faster results and handling more complex tasks with YOLOv8.
While working with YOLOv8 or any object detection model, the choice between CPU and GPU can significantly impact the model’s performance for both training and inference. CPUs, as we know, are great for general purposes and can efficiently handle smaller tasks. However, CPUs fail when the task becomes computationally expensive. Tasks like object detection require speed and parallel computing, and GPUs are designed to handle high-performance parallel processing tasks. Hence, they are ideal for running deep learning models like YOLO. For instance, training and inference on a GPU can be 10–50 times faster than on a CPU, depending on the hardware and model size.
Aspect | CPU | GPU |
---|---|---|
Inference Time (per image) | ~500 ms | ~15 ms |
Training Speed (epochs/hr) | ~2 epochs/hour | ~30 epochs/hour |
Batch Size Capability | Small (2-4 images) | Large (16-32 images) |
Real-Time Performance | No | Yes |
Parallel Processing | Limited | Excellent (thousands of cores) |
Energy Efficiency | Lower for large tasks | Higher for parallel workloads |
Cost Efficiency | Suitable for small tasks | Ideal for any deep learning tasks |
The difference becomes even more pronounced during training, where GPUs dramatically shorten epochs compared to CPUs. This speed boost allows GPUs to process larger datasets and perform real-time object detection more efficiently.
Before configuring YOLOv8 for GPU, ensure you meet the following requirements:
nvidia-smi
after driver installation.To install NVIDIA drivers:
nvidia-smi
nvidia-smi
To use YOLOv8, we need to select the appropriate PyTorch version, which in turn requires CUDA version.
PATH
, LD_LIBRARY_PATH
).nvcc --version
bin
, include
, lib
).To install PyTorch with GPU support, visit the PyTorch Get Started page and select the appropriate installation command. For example:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
Install YOLOv8 by following these steps:
pip install ultralytics
from Ultralytics import YOLO
# Load a COCO-pretrained YOLOv8n model
model = YOLO("yolov8n.pt")
# Display model information (optional)
model.info()
# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640, device = ‘cuda’)
# Run inference with the YOLOv8n model on the 'bus.jpg' image
results = model("path/to/image.jpg")
# use the CLI commands to directly run the model:
from Ultralytics import YOLO
# Load a COCO-pretrained YOLOv8n model
model = YOLO("yolov8n.pt")
# Display model information (optional)
model.info()
# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
# Run inference with the YOLOv8n model on the 'bus.jpg' image
results = model("path/to/image.jpg")
Use the following Python command to check if your GPU is detected and CUDA is enabled:
import torch
# Check if GPU is available
print("CUDA Available:", torch.cuda.is_available())
# Get GPU details
if torch.cuda.is_available():
print("GPU Name:", torch.cuda.get_device_name(0))
Specify the device as cuda
in your training or inference commands:
yolo task=detect mode=train data=coco.yaml model=yolov8n.pt device=0 epochs = 128 plots = True
yolo task=detect mode=val model={HOME}/runs/detect/train/weights/best.pt data={dataset.location}/data.yaml
from ultralytics import YOLO
# Load the YOLOv8 model
model = YOLO('yolov8n.pt')
# Train the model on GPU
model.train(data='coco.yaml', epochs=50, device='cuda')
# Perform inference on GPU
results = model.predict(source='input.jpg', device='cuda')
DigitalOcean GPU droplets are designed to handle high-performance AI and machine-learning tasks. H100s power these GPU Droplets to deliver exceptional speed and parallel processing capabilities, making them ideal for training and running YOLOv8 models efficiently. To add more, these droplets are pre-installed with the latest version of CUDA, ensuring you can start leveraging GPU acceleration without spending time on manual configurations. This streamlined environment allows you to focus entirely on optimizing your YOLOv8 models and scaling your projects effortlessly.
torch.cuda.is_available()
device=0
or device='cuda'
in commands or scripts.PATH
and LD_LIBRARY_PATH
).model.train(data='coco.yaml', epochs=50, device='cuda', amp=True)
from Ultralytics import YOLO
# Load the models
vehicle_model = YOLO('yolov8l.pt')
license_model = YOLO('Registration.pt')
# Process each stream, example for one stream
results = vehicle_model(source='stream1.mp4', batch=4) # Modify as needed for parallel processing
Specify device='cuda'
or device=0
(if using the first GPU) in your commands or scripts when loading the model. This will enable YOLOv8 to utilize the GPU for faster computation during inference and training. Ensure that your GPU is properly set up and detected.
model = YOLO("yolov8n.pt")
model.to('cuda')
YOLOv8 might not be using the GPU if there are issues with the hardware, drivers or setup. To start with, verify CUDA installation and compatibility with PyTorch. Update drivers if necessary. Ensure that your CUDA and CuDNN are Compatible with your PyTorch installation. Install torchvision and check the configuration that is being installed and used.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118`
import torch print(torch.cuda.get_device_name())
Additionally, if PyTorch is not installed with GPU support (e.g., a CPU-only version), or the device
parameter in your YOLOv8 commands may not be explicitly set to cuda
. Running YOLOv8 on a system without a CUDA-compatible GPU or insufficient VRAM can also cause it to default to CPU.
To resolve this, ensure your GPU is CUDA-compatible, verify the installation of all required dependencies, check that torch.cuda.is_available()
returns True
, and explicitly specify the device='cuda'
parameter in your YOLOv8 scripts or commands.
To effectively install and run YOLOVv8 on a GPU, Python 3.7 or higher is recommended, and a CUDA-compatible GPU is required to use GPU acceleration.
A modern NVIDIA GPU with at least 8GB of memory is recommended. For large datasets, more memory is beneficial. For optimal performance, it is recommended to use Python 3.8 or newer, PyTorch 1.10 or higher, and an NVIDIA GPU compatible with CUDA 11.2+. The GPU should ideally have at least 8GB of VRAM to handle moderate datasets efficiently, although more VRAM is beneficial for larger datasets and complex models. Additionally, your system should have at least 8GB of RAM and 50GB of free disk space to store datasets and facilitate model training. Ensuring these hardware and software configurations will help you achieve faster training and inference with YOLOv8, especially for computationally intensive tasks.
Please Note: AMD GPUs may not support CUDA, so choosing an NVIDIA GPU for YOLOv8 compatibility is essential.
To train YOLOv8 using multiple GPUs, you can use PyTorch’s DataParallel or specify multiple devices directly (e.g., cuda:0,1
). For distributed training, YOLOv8 employs PyTorch’s Multi-GPU DistributedDataParallel (DDP) by default. Ensure that your system has multiple GPUs available and specify the GPUs you want to use in the training script or command line. For instance, set --device 0,1,2,3
in the CLI or device=[0,1,2,3]
in Python to utilize GPUs 0, 1, 2, and 3. YOLOv8 automatically handles parallel training across the specified GPUs without requiring an explicit data_parallel
argument. While all GPUs are utilized during training, the validation phase typically runs on a single GPU by default, as it is less resource-intensive than training.
Enable mixed precision and adjust batch sizes to balance memory and speed. Depending on your dataset, training YOLOv8 requires quite a bit of computation power to run efficiently. Use a smaller or quantized model variant (e.g., YOLOv8n or INT8 quantized versions) to reduce memory usage and inference time. In your inference script, explicitly set the device
parameter to cuda
for GPU execution. Use techniques like batch inference to process multiple images simultaneously and maximize GPU utilization. If applicable, utilize TensorRT to optimize the model further for faster GPU inference. Regularly monitor GPU memory and performance to ensure efficient resource usage.
The below code snippet will allow you to process images in parallel within the defined batch size.
from Ultralytics import YOLO
model = YOLO('yolov8n.pt', device='cpu', batch=4) # specify the batch size as needed
# pass the argument ‘images’, which is the list of preprocessed images
results = model.predict(images) # 'images' should have the shape (N, 3, H, W)
If using the CLI, specify the batch size with -b or --batch-size. With Python, ensure the batch argument is correctly set when initializing your model or calling the prediction method.
To resolve CUDA out-of-memory errors, reduce the validation batch size in your YOLOv8 configuration file, as smaller batches require less GPU memory. Additionally, if you have access to multiple GPUs, consider distributing the validation workload across them using PyTorch’s DistributedDataParallel
or similar functionality, though this requires advanced knowledge of PyTorch. You can also try clearing cached memory using torch.cuda.empty_cache()
in your script and ensure that no unnecessary processes run on your GPU. Upgrading to a GPU with more VRAM or optimizing your model and dataset for memory efficiency are further steps to mitigate such issues.
Configuring YOLOv8 to utilize a GPU is a straightforward process that can significantly enhance performance. By following this detailed guide, you can accelerate training and inference for your object detection tasks. Optimize your setup, troubleshoot common issues, and unlock the full potential of YOLOv8 with GPU acceleration.
References
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!