ayooshkaturia and Shaoni Mukherjee
Oct. 8, 2024 update: This tutorial now features some deprecated code for sourcing the dataset. Please see our updated tutorial on YOLOv7 for additional instructions on getting the dataset in a Jupyter Notebook for this demo.
YOLO, or You Only Look Once, is one of the most widely used deep learning-based object detection algorithms. In this tutorial, we will go over how to train one of its latest variants, YOLOv5, on a custom dataset. More precisely, we will train the YOLO v5 detector on a road sign dataset. By the end of this post, you shall have an object detector that can localize and classify road signs. Before we begin, let me acknowledge that YOLOv5 attracted a lot of controversy when it was released over whether it’s right to call it v5. I’ve addressed this a bit at the end of this article. For now, I’d simply say that I’m referring to the algorithm as YOLOv5 since that is the name of the code repository.
We begin by cloning the YOLO v5 repository and setting up the dependencies required to run YOLO v5. You might need sudo rights to install some of the packages.
Info: Experience the power of AI and machine learning with DigitalOcean GPU Droplets. Leverage NVIDIA H100 GPUs to accelerate your AI/ML workloads, deep learning projects, and high-performance computing tasks with simple, flexible, and cost-effective cloud solutions.
Sign up today to access GPU Droplets and scale your AI projects on demand without breaking the bank.
In a terminal, type:
I recommend you create a new conda or a virtualenv environment to run your YOLO v5 experiments as to not mess up dependencies of any existing project. Once you have activated the new environment, install the dependencies using pip. Make sure that the pip you are using is that of the new environment. You can do so by typing in terminal.
For me, it shows something like this.
It tells me that the pip I’m using is of the new environment called yolov5 that I just created. If you are using a pip belonging to a different environment, your python would be installed to that different library and not to the one you created. With that sorted, let us go ahead with the installation.
With the dependencies installed, let us now import the required modules to conclude setting up the code.
For this tutorial, we will use an object detection dataset of road signs from MakeML. It is a dataset that contains road signs belonging to 4 classes: Traffic Light Stop Speed Limit Crosswalk
The dataset is small, containing only 877 images in total. While you may want to train with a larger dataset (like the LISA Dataset) to fully realize YOLO’s capabilities, we use a small dataset in this tutorial to facilitate quick prototyping. Typical training takes less than half an hour, which would allow you to iterate quickly with experiments involving different hyperparameters.
We created a directory called Road_Sign_Dataset to keep our dataset now. This directory needs to be in the same folder as the yolov5 repository folder we just cloned.
Delete the files which are not needed.
In this part, we convert annotations into the format expected by YOLO v5. There are a variety of formats when it comes to annotations for object detection datasets. Annotations for the dataset we downloaded follow the PASCAL VOC XML format, which is a very popular format. Since this is a popular format, you can find online conversion tools. Nevertheless, we are going to write the code for it to give you some idea of how to convert lesser popular formats as well (for which you may not find popular tools). The PASCAL VOC format stores its annotation in XML files where various attributes are described by tags. Let us look at one such annotation file.
The output looks like the following.
The above annotation file describes a file named road4.jpg with dimensions 267 x 400 x 3. It has 3 object tags, which represent 3 bounding boxes. The class is specified by the name tag, whereas the details of the bounding box are represented by the bndbox tag. A bounding box is described by its top-left (x_min, y_min) corner coordinates and its bottom-right (xmax, ymax) corner.
YOLO v5 expects annotations for each image in the form of a .txt file, where each line describes a bounding box. Consider the following image.
The annotation file for the image above looks like the following:
There are 3 objects in total (2 persons and one tie). Each line represents one of these objects. The specifications for each line are as follows:
We now write a function that will take the annotations in VOC format and convert them to a format in which information about the bounding boxes is stored in a dictionary.
Let us try this function on an annotation file.
This outputs:
We now write a function to convert information contained in info_dict to YOLO v5 style annotations and write them to a txt file. In case your annotations are different than PASCAL VOC ones, you can write a function to convert them to the info_dict format and use the function below to convert them to YOLO v5 style annotations.
Now we convert all the xml annotations into YOLO style txt ones.
Just for a sanity check, let us test some of these transformed annotations. We randomly load one of the annotations, plot boxes using the transformed annotations, and visually inspect it to see whether our code has worked as intended. Run the next cell multiple times. Every time, a random annotation is sampled.
OUTPUT
Great, we are able to recover the correct annotation from the YOLO v5 format. This means we have implemented the conversion function properly.
Next we partition the dataset into train, validation, and test sets containing 80%, 10%, and 10% of the data, respectively. You can change the split values according to your convenience.
Create the folders to keep the splits.
Move the files to their respective folders.
Rename the annotations
folder to labels
, as this is where YOLO v5 expects the annotations to be located in.
Now, we train the network. We use various flags to set options regarding training.
An example of letter-boxed image
yolo5s.yaml
, yolov5m.yaml
, yolov5l.yaml
, yolov5x.yaml
. The size and complexity of these models increases in the ascending order and you can choose a model which suits the complexity of your object detection task. In case you want to work with a custom architecture, you will have to define a YAML
file in the models
folder specifying the network architecture.--weights ' '
runs/train/name
data/hyp.scratch.yaml
. If unspecified, the file data/hyp.scratch.yaml
is used.Details for the dataset you want to train your model on are defined by the data config YAML
file. The following parameters have to be defined in a data config file:
train
, test
, and val
: Locations of train, test, and validation images.nc
: Number of classes in the dataset.names
: Names of the classes in the dataset. The index of the classes in this list would be used as an identifier for the class names in the code.Create a new file called road_sign_data.yaml
and place it in the yolov5/data
folder. Then populate it with the following.
YOLO v5 expects to find the training labels for the images in the folder whose name can be derived by replacing images
with labels
in the path to dataset images. For example, in the example above, YOLO v5 will look for train labels in ../Road_Sign_Dataset/labels/train/
.
Or you can simply download the file.
The hyperparameter config file helps us define the hyperparameters for our neural network. We are going to use the default one, data/hyp.scratch.yaml
. This is what it looks like.
You can edit this file, save a new file, and specify it as an argument to the train script.
YOLO v5 also allows you to define your own custom architecture and anchors if one of the pre-defined networks doesn’t fit the bill for you. For this you will have to define a custom weights config file. For this example, we use the the yolov5s.yaml
. This is what it looks like.
To use a custom network, create a new file and specify it at run time using the cfg
flag.
We define the location of train
, val
and test
, the number of classes (nc
) and the names of the classes. Since the dataset is small, and we don’t have many objects per image, we start with the smallest of pretrained models yolo5s
to keep things simple and avoid overfitting. We keep a batch size of 32
, image size of 640
, and train for 100 epochs. If you have issues fitting the model into the memory:
Of course, all of the above might impact the performance. The compromise is a design decision you have to make. You might want to go for a bigger GPU instance as well, depending on the situation.
We use the name yolo_road_det
for our training. The tensorboard training logs can be found at runs/train/yolo_road_det
. If you can’t access tensorboard logs, you can setup a wandb
account so that the logs are plotted over on your wandb account.
Finally, run the training:
Depending on your hardware, this might take up to 30 minutes to train.
There are many ways to run inference using the detect.py
file.
The source
flag defines the source of our detector, which can be:
…and various other formats. We want to run it over our test images so we set the source
flag to ../Road_Sign_Dataset/images/test/
.
weights
flag defines the path of the model which we want to run our detector with.conf
flag is the thresholding objectness confidence.name
flag defines where the detections are stored. We set this flag to yolo_road_det
; therefore, the detections would be stored in runs/detect/yolo_road_det/
.With all options decided, let us run inference over our test dataset.
best.pt
contains the best-performing weights saved during training.
We can now randomly plot one of the detections.
OUTPUT
We can also use other sources for our detector, apart from a folder of images. The command syntax for doing so is described below.
We can use the test
file to compute the mAP on our test set. To perform the evaluation on our test set, we set the task
flag to test
. We set the name to yolo_det
. Things like plots of various curves (F1, AP, Precision curves etc) can be found in the folder runs/test/yolo_road_det
. The script calculates for us the Average Precision for each class, as well as mean Average Precision.
The output of looks like the following:
That’s pretty much it for this tutorial. In it, we trained YOLO v5 on a custom dataset of road signs. If you want to play around with the hyperparameters or train on a different dataset, you can grab the notebook for this tutorial as a starting point.
1. What is YOLO and why is it popular for object detection?
YOLO (You Only Look Once) is a real-time object detection model known for its speed and accuracy. It processes entire images in a single pass, making it efficient for real-time applications like surveillance, autonomous driving, and robotics.
2. What are the key differences between YOLOv5 and previous versions like YOLOv4?
3. Why choose YOLOv5 over other object detection models?
4. How do I set up YOLOv5 for training on a custom dataset?
(pip install -r requirements.txt)
.data.yaml
file with dataset paths.train.py
.5. What is the YOLOv5 annotation format?
YOLOv5 uses text files where each image has a corresponding .txt
file. Each line represents an object:
<class_id> <x_center> <y_center> <width> <height>
All values are normalized between 0 and 1.
6. How can I convert my dataset annotations to the YOLOv5 format?
You can use tools like LabelImg, Roboflow, or custom scripts to convert COCO, Pascal VOC, or other formats to YOLO format.
7. How do I partition my dataset for training YOLOv5?
Split your dataset into:
8. What training options are available for YOLOv5?
9. How do I configure the data and hyperparameters for YOLOv5 training?
data.yaml
to specify dataset paths and classes.hyp.scratch-low.yaml
for learning rate, momentum, and augmentation settings.--batch-size
, --epochs
, and --img-size
in train.py
for further control.10. Can I customize the YOLOv5 network architecture?
Yes, you can modify the model’s backbone, detection layers, and anchors by editing models/yolov5s.yaml or other architecture files.
In conclusion, I would like to share my thoughts on the naming controversy caused by YOLO v5. YOLO’s original developer abandoned the project due to concerns about using his research for military purposes. Since then, multiple people have improved YOLO. Afterward, Alexey Bochkovskiy and others released YOLO v4 in April 2020. Alexey was perhaps the most suitable person to do a sequel to YOLO since he had been the long-time maintainer of the second most popular YOLO repo, which, unlike the original version, also worked on Windows.
YOLO v4 brought many improvements, which helped it greatly outperform YOLO v3. But then Glenn Jocher, maintainer of the Ultralytics YOLO v3 repo (the most popular Python port of YOLO), released YOLO v5, the naming of which drew reservations from many members of the computer vision community. The controversy surrounding YOLO v5 is that, in a traditional sense, it does not introduce any novel architectures, loss functions, or techniques. Additionally, no research paper has yet been released for YOLO v5.
However, YOLO v5 significantly improves the ease with which users can integrate it into their existing workflows. One of the primary advantages of YOLO v5 is that it is implemented in PyTorch and Python, unlike its predecessors (YOLO v1 to v4), which were coded in C. This transition makes YOLO v5 much more accessible to individuals and companies working within the deep learning field.
Moreover, YOLO v5 introduces a streamlined method for defining experiments using modular configuration files, mixed precision training, fast inference, better data augmentation techniques, and more. From this perspective, it might be appropriate to label it as v5 if we view YOLO v5 as software rather than as a new algorithm. Perhaps this is what Glenn Jocher had in mind when naming it v5.
Nonetheless, many members of the community, including Alexey, have strongly disagreed, arguing that it is misleading to refer to it as YOLO v5 since its performance is still inferior to that of YOLO v4.
For a more in-depth look at this debate, check out the post titled “YOLOv5 Controversy — Is YOLOv5 Real?” It is remarkable how rapidly we are advancing in research and technology. It is noteworthy that the next generation of this popular object detection framework was released so soon after its predecessor. Thank you for learning with the DigitalOcean Community. Explore our offerings for computing, storage, networking, and managing databases.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!