Andrew Ng 's machine learning lecture note (11)

Model choosing


After,we get a model. Sometimes w e will wonder how we can optimize it.In order to do so, we can divide our data set into 3 parts, First the traning set(60%),Second the cross validation set(20%), Third the test set(20%). 

For linear regression

Suppose that we have gotten several model based on the same traning set. So which model should we choose? Well, we can use the cost function to estimate the error in the cross validation set. Then we should choose the minimum error model. The test set can also help us to estimate the error.
For logistic regression

The procedure is the same as above, except the cost function should defined as followed:
and the error for the test set is as followed

Bias or high variance problem?

Bias problem means that the figure is underfitting while variance problem means the figure is over fitting.

So, how can we tell? We should consider which element leads to this problem, for example, degree of polynomial. We can plot the J(theta) of the test data and traning data in the same figure and figure out whether there is a bias problem or variance problem.
Now we have the summary on how to choose a good model as followed:

If we are suffering from a high bias problem, adding more data is not likely to help you while if we are suffering from a high variance problem,adding more data is gonna be helpful.
When we're having a model, and we want to check whether we have a high bias(underfitting) or high variance problem(overfitting), we'd better plot the J(theta) of validation set and traning set correspond to the traning examples.  When practising, remember that the validation set should remain the same, we should learn the new theta each time we increase the number of the traning examples.


猜你喜欢

转载自blog.csdn.net/frostmonarch/article/details/80103420