Logistic Regression - Hypothesis Representation

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第七章《logistic回归》中第47课时《Hypothesis Representation》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。

Let's start talking about logistic regression. In this video, I'd like to show you the hypothesis representation, that is, what is the function we're going to use to represent our hypothesis when we have a classification problem.

Earlier, we said that we would like our classifier to output values that are between 0 and 1. So, we like to come up with a hypothesis that satisfies this property, that these predictions are between 0 and 1. When we were using linear regression, this was the form of hypothesis, where h_{\theta }(x)=\theta ^{T}x. For logistic regression, I'm going to modify this a little bit and make the hypothesis of h_{\theta }(x)=g(\theta ^{T}x), where I'm going to define the function g as follows:

g(z)=\frac{1}{1+e^{-z}}

This is called the sigmoid function or the logistic function. And the term logistic function, that's what give rise to the same logistic regression. And by the way, the terms sigmod function and logistic function are basically synonyms and mean the same thing. So the two terms are basically interchangeable and either term can be used to refer to this function g. And if we take these two equations, and put them together, then here's just an alternative way of writing out the form of my hypothesis. I'm saying h_{\theta }(x)=\frac{1}{1+e^{-\theta ^{T}x}}, and all I have done is I've taken the variable z, z here is a real number, and plugged in \theta ^{T}x, so end up with, \theta ^{T}x, in place of z there. Lastly, let me show you what the sigmoid function looks like. We're going to plot it on the figure here. The sigmid function g(z), also called the logistic function, looks like this. It starts off near 0 and then rises until it processes 0.5 at the origin and then it flattens out again like so. So that's what the sigmoid function looks like. And you notice that the sigmoid function, well, it asymptotes at 1, and asymptotes at 0 as z, the horizontal axis is z, goes to minus infinity, g(z) approaches zero, and as z approaches infinity, g(z) approaches 1. Because g(z) offers values that are between 0 and 1, we also have that h_{\theta }(x) must be between 0 and 1 (0\leq h_{\theta }(x)\leqslant 1). Finally, given this hypothesis representation, what we need to do(get next), as before, is fit the parameters \theta to our data. So giving a training set, we need pick a value for the parameters \theta and this hypothesis will then let us make predictions. We'll talk about a learning algorithm later for fitting the parameter theta. But first let's talk a little about the interpretation of this model.

Here is how I'm going to interpret the output of my hypothesis h_{\theta }(x). When my hypothesis outputs some number, I am going to treat that number as the estimated probability that y is equal to 1 on a new input example x. Here is what I mean. Here is an example. Let's say we're using the tumor classification example. So we may have a feature vector x, which is this x_{0}=1 as always, and then our one feature is the size of the tumor. Suppose I have a patient come in and they have some tumor size and I feed their feature vector x into my hypothesis and suppose my hypothesis outputs the number 0.7. I'm going to interpret my hypothesis as follows. I'm going to say that this hypothesis is telling me that for a patient with features x the probability that y equals 1 is 0.7. In other words, I'm going to tell my patient the tumor, sadly, has a 70% chance or a 0.7 chance of being malignant. To write this out slightly more formally or to write this out in math, I'm going to interpret my hypothesis output as P of y equals 1, given x, parameterized by \theta, i.e., h_{\theta }(x)=P(y=1|x;\theta ). So, for those of you that are familiar with probability, this equation might make sense; if you're a little less familiar with probability, here is how I read this expression, this is the probability that y is equals to one given x, so that is given that my patient has features x. Given my patient has a particular tumor size represented by my features x, and this probability is parameterized by \theta. So I'm basically going to count on my hypothesis to give me estimates the probability that y is equals to 1. Now since this is a classification task, we know that y must be equal to 0 or 1, right? Those are the only two values that y could possibly take on, either in the training set or for new patients that may walk into my office or into the doctor's office in the future. So given h_{\theta }(x), we can therefore compute the probability that y is equal to 0 as well. Concretely, because y must be either 0 or 1, we know that the probability that y=0, plus the probability of y=1, must add up to 1. This first equation looks a little bit more complicated, but it's basic saying that probability that y=0 for a particular patient with features x, and given our parameters \theta, plus the probability of y=1 for that same patient with features x and given parameters \theta must add up to 1. If this equation looks a little bit complicated, feel free to mentally imagine it without that x and \theta. And this is just saying that the probability of y=0 plus the probability of y=1 must be equal to 1. And we know this to be true because y has to be either 0 or 1. So  the chance of y being 0 plus the chance that y is 1 those two must add up to 1. And so if you just take this term and move it to the right-hand side, then you end up with this equation that says probability that y=0 is one minus probability y equals 1. And thus if our hypothesis if h_{\theta }(x) gives that term you can therefore quite simply compute the probability, or compute the estimated probability that y is equal to 0 as well. So you now know what the hypothesis representation is for logistic regression and we're seeing what the mathematically formula is defining the hypothesis for logistic regression.

In the next video, I'd like to try to give you better intuition about what the hypothesis function looks like. And I want to tell about something called the decision boundary and we'll look at some visualizations together to try to get a better sense of what this hypothesis function of the logistic regression really looks like.

<end>

发布了41 篇原创文章 · 获赞 12 · 访问量 1306

猜你喜欢

转载自blog.csdn.net/edward_wang1/article/details/104544247