Detailed network structure Resnet-50

To solve the problem :

  As the gradient disappears, deep web is difficult to train. Because the gradient back propagation to the front layer, the gradient may make repeated multiplication infinitesimal. The result is that, with the deeper layers of the network, its performance tends to saturation, or even decline rapidly.

main idea:

  Introducing a shortcut identity (also referred to as a jump cable), skip one or more layers. Figure a

 

      Figure I

  

When this jump cable, when the network cause deep level gradient disappears, f (x) = 0, y = g (x) = relu (x) = x

1. Such a structure stacked on the network, even if the gradient disappears, I learn anything, I at least look like the original identity map in the past, the equivalent of a stack of "copy layer" over the shallow network, at least not so worse than the shallow network.

2. In case I accidentally learned anything, it would make a big, since I often identity mapping, so I learned a lot of things to probability.

About why residual structure (i.e., more than one connection line after the jump) alleviated to some extent why the mathematical derivation of the gradient dissipation:

 

 When the sum is the same size may be added, when the same size as described above, when the size is not the same ,

 

 The so-called Resnet-18, Resnet-50, etc., but nothing different layers, as shown, is conventional and Resnet-50 101

 

 

 

Guess you like

Origin www.cnblogs.com/qianchaomoon/p/12315906.html