Andrew Ng computer vision course

Preview: Last August, Andrew Ng publish articles on the medium, said he is conducting a three AI-related business projects, and now a few months later, two of which have been released in stages - artificial intelligence online course platform deeplearning.ai, the AI for Landing.AI manufacturing. And just this afternoon, Andrew Ng on twitter said third project will officially meet you tomorrow. As for what kind of impact it will bring, let us wait and see!

Editor's note: Andrew Ng machine learning and computer vision lesson has been a popular class on Coursera. Although many people criticized their lack of mathematical basis, but as a theory, which undoubtedly is the entry for beginners learning machine / CV a shortcut. So, what do we learn from this? Not long ago, Ryan Shrott chief analyst at National Bank of Canada shared his experience lectures.

Week 4 Homework: Migrating modern impressionist works Rain Princess style to face Andrew Ng

Recently, I just finished reading Andrew Ng on Coursera computer vision course and found the teacher to explain some complex theory of time to do a lot of treatment, both vivid, straightforward and simple and obvious, I really benefited. One of my favorite lesson is a nerve style Migration (Lesson 11), which surrounds an example of the arbitrary image and Monet style combining swing, I feel the joy of learning.

In this article, I will introduce my 11 important lessons from these lessons learned, we hope to give beginners some other reference.

Lesson 1: Why is computer vision is booming?

Test error development will build intelligent systems and algorithms to large data convergence Bayesian minimum error, in order to achieve optimal decision, the artificial intelligence shows beyond human performance in a number of areas, including natural perceptual tasks;

TensorFlow increasingly rich open-source software that allows you to quickly build target detection model with an arbitrary object transfer learning (transfer learning);

By studying migration, you only need 100-500 samples can make a good model to run, compared to large data sets, the manual for the 100 mark image is not difficult, which means you can get a soon as possible "enough" small products.

Lesson 2: How convolution work?

Andrew Ng introduced in the course of realization of convolution operators (convolution operator), and how it is to detect edges in an image. He also explained some of the filter, such as Sobel filter will give the center pixel of the image edges more weight. Wherein, filter weights should not be manually set, which it should be based on the local optimal climbing algorithm to find, for example, gradient descent.

In addition, Andrew Ng also introduces the principle of convolution. His main outlines two specific theoretical knowledge, one shared parameter, the second is the sparsity of the connection.

Convolution neural network computer vision task usually contains a large number of neurons, and each neuron and window because data connection, if a weight and a calculated bias, it is likely that you use to hundreds of millions of weights. A parameter sharing is based on the idea proposed in this case, it is assumed useful in a certain part of the image pixels in another part of the filter is also useful, e.g., an edge filter can also be used to image other portions of the image, then we just calculate nerve a number of dimensions of the right meta data window × heavy.

It supports only a small parameter sharing parameters, allowing translation invariance (robust translation invariance), to cite a simple example, if an image is rotated kitten 90 °, which is essentially an image or a kitten.

Sparsity connection refers to each output layer is the small number of input by way conversion function (filter the square root of the number), which greatly reduces the number of parameters of the network, thereby speeding up the training speed.

Lesson 3: Why Padding?

Padding i.e., to add some data to the input image edge image and the output image of the same size. Padding is also used to frame the image near the edge is relatively close to the center of the frame image as great contribution to the output.

Lesson 4: Why largest pool layer?

Examples of proof, the largest pool of the layer is very effective in CNN. By sampling the image, we can try to ensure that the image features unchanged, reducing the number of parameters in the case of downsizing.

Lesson 5: classic neural network framework

Andrew Ng introduces three classical neural network framework: LeNet-5, AlexNet and VGG-16. He believes efficient neural network typically includes increasing the size of the channel width and height of the decreasing feature.

Lesson Six: Why ResNets effective?

For an ordinary neural network, due to the disappearance of the gradient or gradient explosion, its training error and will not increase the number of layers gradually decrease. ResNets introduced a able to skip one or more layers of "shortcut connection", rely on it, we can train very large neural networks without sacrificing performance.

 

Lesson Seven: Using transfer learning

Large-scale neural network trained from scratch usually takes a few weeks, a time-saving method is extracted from the pre-training model weights, and re-training the last softmax layer (or last several layers). The reason for this is that early training and layers may tend to all features in the image are associated, such as edges and curved lines.

Lesson Eight: How to Win Contest on Computer Vision

Andrew Ng also talked about some of the courses in computer vision techniques to participate in the competition, such training should be several separate neural networks to calculate their average score. Or enhancement data by some means, such as random crop the image, a horizontal / vertical rotating an image to improve network performance. Finally, do not forget to select from the outset fly open source implementation and pre-training model, and then fine-tune the parameters according to your specific requirements.

Lesson 9: How to achieve object detection?

Andrew Ng introduces the basic concepts of feature point detection (landmark detection), in general, these feature points will be part of your income training output. Through some convolution operation, you will get an output that describes the probability and the coordinates of the target area where there is a target of the current.

In addition, he also explains how to use the lap ratio and cross ratio (Intersection-over-Union, IoU), frame candidates (candidate bound) which produce the original marker box (ground truth bound) to evaluate the performance of object detection algorithm . Finally, Andrew Ng Taken together these contents, details the famous algorithm YOLO.

Lesson Ten: How to achieve recognition?

Face recognition is a single sample learning (One shot learning) problem, because it's only one label transfer learning sample, which is an example of image recognition rely on the people. The method is similar to learn a correlation function, given the degree of similarity between the two images. All the images are the same person, a function over a small value should be output; otherwise, outputs a large value.

The first solution is given by Andrew Ng twins network (Siamese Network). The idea of ​​this program is, respectively, the two people enter the same network, and then compare their output. If the output is similar, they may be the same person. Goal is to train the neural network is the same person if the two input images, that their relative distance coding should be small.

The second solution is to use the error function Triplet loss based on a measure of learning. Is a known function of the output of high-dimensional vector space, if we select the Triplets (triplets) comprises two mating face and a non-matching thumbnails face thumbnail (Anchor (A), Positive (P) and Negative (N)), the neural network after training, it outputs a, the relative distance between the P, N, our goal is to make the distance between a and P is smaller than the distance between a and N.

Lesson Eleven: how to migrate style?

As shown below, Andrew Ng describes how to generate, style ideas image having a combination of content.

 

The key nerve style migrated to understand that visual features in each layer of the network convolution learned. It turns out that the early edge layers tend to learn this simple feature; and later layers is more emphasis on learning complex objects, such as face, foot and car.

To make such a style migrated image, we first need to define a cost function:

J(G)= alpha × Jcontent(C,G)+ beta × Jstyle(S,G)

Wherein G is a generated image, C is the content of the image, S is the style image. Learning algorithm simply by gradient descent, to generate an image on a cost function G is minimized.

The main steps are:

G randomly generated;

Drops with a gradient that the J (G) is minimized, denoted: G: = G-dG (J (G));

Repeat Step 2.

Epilogue

After learning this course, I get a lot of intuitive understanding of the theoretical knowledge of computer vision, homework help detect these practices I use to understand correct. Although this course will not make me suddenly become experts in the appropriate field, but I did gain a lot of creativity and inspiration.

Original Address: www.kdnuggets.com/2017/12/ng-computer-vision-11-lessons-learnied.html

  • Posted: 2018-01-30
  • 原文链接:http://kuaibao.qq.com/s/20180130G1ISWF00?refer=cp_1026
  • 腾讯「云+社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。

Guess you like

Origin blog.csdn.net/weixin_42572978/article/details/93731449