Andrew Ng machine learning 课程笔记--欠拟合和过拟合的概念

写在开头:这是我看吴恩达的课程写的笔记,原来写在onenote上。不是我原创的,也不是转载别人的作品,也没有翻译,所以在类别的选择上常常踌躇。

Linear regression:evaluate your hypothesis H at a certain point with a certain query point low K is X.And you want to know the predicted value of Y at this position of X.So,what we were doing  was we would fit theta to minimize sum over I YI minus theta,transpose XI squard,and return theta transpose X.

 

Locally weighted regression:You're going to look at this point X,and then I'm going to look in my data set and take into account only the data points that are sort of in the vincinty of X where I want to value my hypothesis.And I will apply linear regression to fit a strainght line just to this sub-set of the data.Then evaluate this particular value of straight line.

Let's go ahead and formalize that.We're going to fit theta to minimize sum over I to minimize that where these terms W superscript I are called weights.Exponential decay function seems to be a more reasonable choice on many problems.

 

Problistic iinterpretation of the linear regression:

 

Least squares:minimizing the square of the area between the predictions of the hypotheses and the value Y predicted.

 

Classification:y=0 or 1

 

Logistic regression:if you actually computer this partial derivative,so you take this formula,this L of theta,if you take the likelihood of theta,and if you take it's partial derivative wiith respect to theta I you find that thia is equal to

 

Generative learning algorithm:

 

discriminative learning algorithm

 

 

Perception:it's exactly the same as before,except that G of Z is now the step  FUNCTION.

It turns out there is this learning vcalled the perceptron learning rule that actually the same as the classic gradient ascent for logistic regression.it looks just like the classic gradient ascent rule for logistic regression.

 

Newton's method

 

Underfitting:the algorithm is just failing to fit.

Overfittiing:the idiosyncrasies properties of the specific data set

Feature selection algorithm:automatic choosing what features you use in a regression problem

 

Non-parametric learning algorithms:help to alleviate the need somewhat for you to choose features very carefully.it's an algorithm where the number of parameters grows with M .the amount of stuff that your learning algorithm needs to keep around will grow linearly.for example,locally weighted regression(loess).

The original data points,but you need the all data set,just as non-parameter learning algorithm.the huge data set is horroible.(Andrew moon KD TREE)

 

Parametric learning algorithm:which is defined as an algorithm that has a fixed number of parameters that fit to the data.for example,in linear regression,wo have a fix set of parameters theta that must fit the data.

猜你喜欢

转载自blog.csdn.net/weixin_43218659/article/details/87779531
今日推荐