Why can F1 score be used as an important indicator to measure class imbalance tasks?

F1 value calculation formula:

Acc = \frac{TP +TN}{TP+TN+FP+FN}

Precision=\frac{TP}{TP+FP} = \frac{1}{1+\frac{FP}{TP}}

 Recall=\frac{TP}{TP+FN}=\frac{1}{1+\frac{FN}{TP}}

The denominator of Recall (TP + FN) can be understood as: the number of positive examples in the real sample label;

F1=\frac{2*Precision*Recall}{Precision+Recall}=\frac{2}{\frac{1}{Precision}+\frac{1}{Recall}} 

The F1 value is related to TP, FP, and FN, where the meanings of TP, FP, and FN are:

TP The model predicts that the data label is a positive example, and the true label of the data is also a positive example;
FP The model predicts that the data label is a positive example, but the true label of the data is a negative example;
FN The model predicts that the data label is a negative example, but the true label of the data is a positive example;

        The larger the F1 value, the larger the Precision and Recall are required; the larger the Precision value, the \frac{FP}{TP}smaller the required value; the larger the Recall value, the smaller the required value \frac{FN}{TP}, that is , the larger the TP value, and the smaller the FP and FN values .

        Look at \frac{FN}{TP}this value first. From the descriptions of TP and FN in the above tables, we can know that the two are related. A small FN value means that the model can predict more real cases. Yes, the value of TP becomes larger, indicating that the Recall value is concerned. The model predicts the real case data ; next, look at \frac{FP}{TP}this value, the smaller the FP, the less false positive data the model predicts, and the less false positive data means that the model can predict more true negative data. Therefore, the Precision value focuses on the model's prediction of true negative data . Through these analyses, we can see that the F1 value pays attention to the model's prediction of positive and negative data. To increase the F1 value, it is necessary to predict both positive and negative data as correctly as possible. Therefore, the F1 value can be used as an important indicator to measure the imbalance of categories.

        Let's understand it from another angle, starting from the formula of Precision and Recall without any transformation.

The denominator of Precision (TP + FP) can be understood as: the number of positive data predicted by the model; if the Precision value wants to become larger, the denominator needs to be smaller, and the smaller the denominator, the false positive (FP) value needs to be smaller, which means This means that the model needs to be able to correctly predict the data of the real label as a negative example as a negative example, which shows that Precision pays attention to the situation where the model predicts the true negative example data.

The denominator of Recall (TP + FN) can be understood as: the number of positive examples in the real sample label. This value is a fixed value. If the Recall value wants to become larger, TP needs to be larger, and TP larger means that the model needs to be able to correctly The data whose true label is positive is predicted as much as possible, which shows that Recall pays attention to the situation where the model predicts true data.

How to calculate the F1 value in the multi-classification task? Refer to this article: F1 value calculation in multi-classification tasks

Guess you like

Origin blog.csdn.net/qq_43775680/article/details/131438402