Detailed explanation of confusion matrix and F1score

Confusion Matrix Confusion Matrix

Confusion Matrix

Insert image description here

TP (True Positives): true examples, predicted to be positive but actually positive;
FP (False Positives): false positives, predicted to be positive but actually negative;
FN (false Negatives): false negatives For example, it is predicted to be a negative example but it is actually a positive example;
TN (True Negatives): True Negatives, it is predicted to be a negative example but it is actually a negative example.
legend

As shown in FIG:

  • The green box is actually not a cat but the predicted result is a cat. This is a FP that turns a negative prediction into a positive one; 1
  • The red box is actually not a cat and the prediction result is not a cat. This is a negative prediction called TN; 4
  • The yellow box is actually a cat and the prediction result is also a cat. This is TP for making positive predictions positive; 3
  • There is actually a cat in the blue box but the predicted result is not a cat. This is a FN that turns a positive prediction into a negative one; 2
    Insert image description here

Accuracy:

The proportion of all correctly predicted samples (including positive examples or negative examples that are predicted correctly, that is, positive examples are predicted to be positive TP or negative examples are predicted to be negative TN) to the total samples.
Insert image description here
It can be seen from the legend that seven of the total samples (10) have correct predictions, and the accuracy rate is 7/10=70%.

Although the accuracy rate can determine the overall accuracy rate, it cannot be used as a good indicator to measure the results when the sample is unbalanced. In the case of unbalanced samples, the high accuracy obtained is meaningless, and the accuracy will be invalid.

Precision (also called precision)

The ratio of positive samples predicted to be positive to all samples predicted to be positive (for prediction, including true cases TP and false positives FP). That is, the proportion of correct predictions that are positive to all predictions that are positive, (the proportion of all predictions that are truly correct that are positive) It can be seen from the above
Insert image description here
legend that among the (4) samples that are predicted to be positive, 3 are actually positive, and the accuracy is 3/4 = 75%.

The main part we care about is the positive example, so the precision rate is the accuracy of the positive example prediction relative to the prediction result of the positive example. The straightforward meaning is that among the samples predicted by the model as positive examples, the true positive examples account for the proportion of samples predicted as positive examples. Use this standard to evaluate the accuracy of predicting positive examples. The precision rate represents the accuracy of prediction in positive sample results, and the accuracy rate represents the overall prediction accuracy, including positive samples and negative samples.

That is, Precision is for the prediction results. In the prediction results, the probability of correct prediction among samples with positive predictions. **Similar to how many correct answers a candidate wrote on the exam paper. **Reflects the accuracy of the model. The model says: Whichever I say is correct is correct.

Recall rate (also called recall rate, recall)

The proportion of positive examples that are predicted to be positive to all the samples that are actually positive examples (the actual positive examples may be predicted to be positive examples, that is, true examples TP, or the actual positive examples may be predicted to be negative examples, that is, false negative examples, FN) (the truly correct examples The proportion of all actual positive examples)
is based on actual samples. Among the samples that are actually positive examples, the proportion of positive examples that are predicted to be correct accounts for the total actual positive examples.
Insert image description here
From the above illustration, we know that among the actual positive samples (5), there are 3 positive samples predicted to be positive, and the recall rate is: 3/5 = 60%.

Recall is for data samples. Among data samples, the probability of correct prediction among positive samples. **Similar to how many questions a candidate answered on the exam paper. **Reflects the comprehensiveness of a model. The model says: I can find everything that is right.

F1-score

F-score is an indicator used to evaluate the performance of a two-classification model. It combines the precision (Precision) and recall rate (Recall) of the model from two perspectives, subjective (Predicted) and objective (Actual). Analyzing whether TP is large enough helps us comprehensively consider the prediction accuracy of the model and its ability to capture positive samples.

  • FP/TP affects the subjective judgment of whether TP has enough weight, that is, whether the value of TP is subjectively large enough.
  • FN/TP affects the objective judgment of whether TP has enough weight, that is, whether the value of TP is objectively large enough.
精确率和召回率互相影响,理想状态下肯定追求两个都高,但是实际情况是两者相互“制约”:
追求精确率高,则召回率就低;追求召回率高,则通常会影响精确率。
我们当然希望预测的结果精确率越高越好,召回率越高越好, 但事实上这两者在某些情况下是矛盾的。
这样就需要综合考虑它们,最常见的方法就是F-score。 也可以绘制出P-R曲线图,观察它们的分布情况。
F1值为算数平均数除以几何平均数,且越大越好,将Precision和Recall的上述公式带入会发现,
当F1值小时,True Positive相对增加,而false相对减少,
即Precision和Recall都相对增加,即F1对Precision和Recall都进行了加权。

You can think about when F1 approaches 1 and when does it approach 0?
Insert image description here
From the above legend, we can see that F1= (2 3) / (2 3 + 1 + 2) = 66.6%.

The core idea of ​​F1 is to improve Precision and Recall as much as possible while also hoping that the difference between the two should be as small as possible. F1-score is suitable for two-classification problems. For multi-classification problems, the two-classification F1-score is generalized, and there are two measures: Micro-F1 and Macro-F1.

Conclusion: The value of F-score will only be large when both Precision and Recall are large.

More generally,
Insert image description herein addition to the F1 score, the F0.5 score and the F2 score have also been widely used in statistics. Among them, in the F2 score, the recall rate has a higher weight than the precision rate, while in the F0.5 score, the precision rate Rate is weighted higher than recall rate.

Macro-F1和Micro-F1

  • Macro-F1 and Micro-F1 are relative to multi-label classification.
  • Micro-F1, calculate the total Precision and Recall of all categories, and then calculate F1.
  • Macro-F1 calculates F1 after calculating the Precison and Recall of each class, and finally averages F1.

Thought questions:

In the figure below, what are the TN, FN, TP, FP, accuracy, precision, recall and F1-score respectively?
Insert image description here

Reference 1
Reference 2 Confusion Matrix Five Minutes Getting Started
Reference 3

Answer:

  • F-score = 0 (actually infinitely close to 0).
    Subjectively, TP is very small OR. Objectively, TP is very small
    , that is, FP or FN is much larger than TP. By making the limit assumption here, we can know that F-score is close to 0.
  • F-score = 1
    subjectively and objectively speaking, TP is very large, that is, FP and FN are both equal to 0 (lower limit).
    Insert image description here
    Insert image description here

Insert image description here

Guess you like

Origin blog.csdn.net/m0_68165821/article/details/132261322