K-nearest neighbors and Euclidean Distance

K-nearest neighbors and Euclidean Distance

This is my study notes in machine learning, writing articles in English because I want to improve my writing skills. Anyway, thanks for watching and if I made some mistakes, let me know please.

What is K-nearest neighbors algorithm?

It is a supervised learning algorithm in classification. We have prior-labeled data for training, telling the machine which data belongs to which group. Clustering is the other algorithm in classification but it is unsupervised learning method.
The algorithm bases on distances between predicted data and trained data which are knew before. Distance is also understood as proximity intuitively.

What the K and nearest means?

K is a number we can choose, which symbols how many data points we choose that nearest to the new data. Usually we want to use an odd number as K because this algorithm is going to basically go to a majority vote based on the neighbors. If we use even number, we may get into trouble of 50/50 split situation. There many ways to apply weights to distance to penalize greater distances data, we may use even number for K.

Accuracy or Confidence

During prediction process, the algorithm will select K points which closest to the new data point, and then, find out the largest categories(classification), it means probably this data belongs to this group. The rate of ’positive number / K’ stands for confidence, which means how much we can trust this data belongs to this group. Accuracy is used in testing model after training. They are completely different.

Euclidean Distance

coming soon

猜你喜欢

转载自blog.csdn.net/kevin_chan04/article/details/83038864