Over the past decade, the application of deep learning techniques in the field of computer vision has increased year by year. Among them, pedestrian detection and vehicle detection are the most popular. One of the reasons is the reusability of the pre-trained model .
Due to the excellent results of deep learning technology in these application scenarios, enterprises have now begun to use deep learning to solve their own problems.
But what do you do if the available pre-trained models are not suitable for your application?
A pre-trained model is able to detect apples, but it certainly cannot distinguish between "good apples" and "rotten apples" because it was never "taught" to do so.
So what do you do if this happens to you?
"Get tons of images of good and bad apples and train a custom detection model!"
A common challenge when creating a good custom detection model is data issues . Deep learning models require vast amounts of data to train their algorithms—as we have seen in models such as MaskRCNN
, , YOLO
and , which are trained on UNet
existing large datasets COCO
and .ImageNet
How can I get data for training a custom detection model?
In this post, we'll explore 5 ways to collect a dataset to train a custom detection model.
1. Publicly available open labeled datasets
If you're lucky, you might get the labeled dataset you want on the internet. Here are a few image datasets that you can choose from in the field of computer vision.
-
ImageNet
ImageNet
It is a computer vision system recognition project and the largest image recognition database in the world. ImageNet
It was established by computer scientists from Stanford in the United States to simulate a human recognition system. Ability to recognize objects from pictures. ImageNet
The dataset has detailed documentation, is maintained by a dedicated team, and is very easy to use. It is widely used in research papers in the field of computer vision, and has almost become the "standard" dataset for the current deep learning image field algorithm performance test. ImageNet
There are currently 14,197,122
a total of images in the database, which are divided into 21,841
categories. Usually, what we call ImageNet
a data set actually refers to a sub- dataset ISLVRC2012
for competitions. train
There are 1,281,167
photos and labels in it 1000
, and they are of the same category. There are two types of data, and there are sub-pictures, each type of data.1300
val
50,000
50
test
100,000
100
-
MS COCO
COCO
The data set is an image data set released by the Microsoft team recognition+segmentation+captioning
. The data set collects a large number of daily scene pictures containing common objects, and provides pixel-level instance annotations to more accurately evaluate the effect of detection and segmentation algorithms. It is committed to Advances in Scene Understanding Research. Relying on this data set, an annual competition is held, which now covers the central tasks of machine vision such as detection, segmentation, key point recognition, and annotation. It is one of the most influential ImageNet Chanllenge
academic competitions since then.
COCO
The detection task contains 80
a total of categories. The data scales released in 2014 are train/val/test
divided into two categories 80k/40k/40k
. The more common division in academia is to use the subset of and as the training set ( ) train
, use the rest as the test set ( ), and submit to the official Submit results ( ). In addition, the official also retains part of the data as the evaluation set of the competition.35k
val
trainval35k
val
minival
evaluation server
test-dev
COCO
test
-
Google Open Image
Open Image
It is a dataset released by the Google team. It contains bounding boxes for object categories 190
on 10,000 images , making it the largest existing dataset annotated with object locations. These boxes were largely drawn by hand by professional annotators to ensure accuracy and consistency. These images are very diverse, often containing complex scenes with multiple objects (on average per image ).600
16M
8.3
-
MNIST Handwritten Dataset
MNIST Handwriting Dataset: This dataset has a total of 70,000
images of handwritten digits and is a subset of a larger dataset provided by NIST. Figures are size normalized and centered within the fixed-size image.
-
DOTA
DOTA is a commonly used data set for remote sensing aerial image detection. It contains 1 aerial image
2806
with a size of about 100000000000000000000000000000000000000000000 . Its marking method is a quadrilateral of arbitrary shape and direction determined by four points. Aerial images are different from traditional datasets and have their own characteristics, such as: greater scale variability; dense small object detection; and uncertainty in detecting targets. The data is divided into validation set, test set, and training set. The training and validation sets are currently released, with image sizes ranging from .4k×4k
15
188282
14
small vehicle
large vehicle
vehicle
1/6
1/3
1/2
800×800
4000×4000
2. Crawl web images
Another option is to do an image search on the web and manually select images to download. This method is not very efficient due to the large amount of data required.
It is worth noting that images on the web may be subject to copyright. Remember to check the copyright of images before using them.
Or you could write a program to crawl the web and download the images you want. Also take care to check the copyright of each image.
3. Shoot
If you can't find images of the objects you want, you can collect them by taking pictures. This can be done manually, by taking each image yourself or by hiring someone else to do it for you.
Another way to collect real-world imagery is to have programmed cameras in your scene to automatically collect images.
4. Data Augmentation
We know that deep learning models require a lot of data. When you only have a small dataset, it may not be enough to train a good model. In this case, you can use data augmentation to generate more training data.
Geometric transformations such as flipping, cropping, rotation, and translation are some commonly used data augmentation techniques. Applying image data augmentation not only expands the dataset by creating variants, but also reduces overfitting.
△The left is the original image of the dog, and the right is the horizontally flipped image
△ Original and randomly cropped images of cats
△ Original and rotated images of cats
△ Original and translated images of tennis balls
5. Data Generation
Sometimes real data may not be available. In this case, synthetic data can be generated to train a custom detection model. Due to its relatively low cost, the use of synthetic data generation has been increasing in machine learning.
Generative Adversarial Networks (GANs GAN
) are one of many techniques for synthetic data generation. GAN
is a generative modeling technique in which artificial instances are created from a dataset in a way that preserves similar characteristics of the original set.
Summarize
Collecting a training dataset is the first step in training your custom detection model. In this post, we examine some of the techniques used to collect image data, including searching open source datasets, crawling the web, shooting manually or using programs, using data augmentation techniques, and generating synthetic datasets.