Deep Learning with Grandpa 3 Neural Network Debugging Practice

1. Foreword

We made a weather prediction model earlier, and the training results were all good. The network was "fitting" anyway, but the prediction data did not meet expectations, which kept me worried. So I spent a long time researching why, my theoretical basis is obviously no problem (today's *average temperature is correlated with *weekly* average temperature), but there is a problem with the predicted data.

In order to solve this problem, I built a minimal problem and model, started with the simplest problem, and checked the cause of the problem. The hard work paid off, and I really found it for me.

2. A very simple question

Suppose now there is an input X=[1,2,3], and the corresponding Y=[2,5,6]. We follow the previous steps to build the neural network.

As shown in the above code, the training data (features) has only 3 rows and 1 column, and the corresponding correct value (labels) is also 3 rows and 1 column.

As shown in the figure above, a very simple network is constructed, and the total number of neurons is only 10. At the same time, the network configuration of the previous temperature prediction is retained. The training results are ok.

As shown above, the predicted results are not satisfactory. Normally it should be 2, 5, 6, but actually it is 2.3, 4.3, 6.3. Such a simple data structure and network should not have such a large deviation. So I tried to predict only one data.

Things seem to be getting worse, so something must be wrong. I have tried to change all the parameters that can be modified, and some parameters can be modified to fit normally, and some cannot be fitted. But even the normal fitting network results are like the above picture, always predicting wrong values.

Finally, at the end, it suddenly occurred to me that when I first came into contact with neural networks 3 years ago, the tutorial at that time said that if you want to make "probability" predictions, it is best to normalize the data to be trained, that is, 10, 20, 30, such data are converted into numbers between 0-1 such as 0.1, 0.2, 0.3 according to the normal distribution. But obviously the small model we built here and the weather forecast we made before are not "probability" forecasts, but "numerical" forecasts, so shouldn't normalization be done?

At this time, although the predicted results are mediocre, at least the values ​​are not very strange, so the problem should be that the network is not "fitting" enough. So I tweaked it randomly, really just changed it randomly.

An activation function has been added. Although the loss value is high, its reduction is very gratifying.

So I increased the magnitude of the loss and increased the number of training times, and then the network magically "converged", and the convergence was very good.

The predicted results are also very good.

3. Modify the weather forecast model

On the basis of the original code, only the normalization is removed, the number of training is increased, and the predicted results have changed greatly. Although there are still some differences between the predicted temperature and the actual temperature, at least it is floating on a horizontal line, and the previous forecast result has directly doubled.

4. Review

In fact, I also found the code for the previous weather forecast from the Internet. His tutorial is actually really good, but I didn't expect there to be such a big pit. So to engage in neural networks, it is not enough to just know how to change the parameters of TensorFlow, you still have to understand the basic mathematical principles. Or as everyone said, directly use the classic network structure, and many details cannot be changed casually.

Guess you like

Origin blog.csdn.net/weixin_40402375/article/details/130210613