[Thesis analysis] An unsupervised adversarial learning method for hard disk failure prediction (2019, Journal of Xidian University)

Author information:
Jiang Shaobin (1991—)
Master's degree candidate of National University of Defense Technology
E-mail:[email protected]

Key words

Domain: Anomaly detection, deep learning
Method: Unsupervised adversarial learning
Scenario: Hard disk failure detection
Network structure: Based on the combination of LSTM autoencoder and generative adversarial network
Data set: BackBlaze

Main method

The advantage of using unsupervised adversarial learning is that because abnormal samples (ie positive samples) are not used in the training phase, the model is not affected by sample imbalance, which can avoid the imbalance of training samples.Overfittingproblem.

Existing studies mostly use short-term sequence data within 5 days for learning and testing, and they cannot learn the long-term stable change trend of self-monitoring analysis report data, making the model not robust. At the same time, combined with the generative confrontation network proposed in 14 years. So proposedCombination of LSTM-based autoencoder and generative confrontation networkUsing adversarial training methods, the model learns the distribution characteristics of normal samples in the sample space and latent space.Since the reconstruction error of the sample layer is susceptible to noise interference, abnormal samples are detected on the latent vector, Thereby improving the model anomaly detection performance.

Due to the huge number of normal samples, even a low false alarm rate will degrade detection performance. In view of this, when evaluating the performance of the model, the article mainly uses recall and precision as evaluation indicators, supplemented by balanced score F1 (F1 Score) as a comprehensive evaluation indicator.

After the experimental comparison of BackBlaze's data set for a whole year, the recall rate and accuracy of abnormal samples are higher than the supervised/semi-supervised learning method, which can effectively detect disk failures.

Anomaly detection model

Exception definition

Suppose a time node t, the model takes the data of the hard disk l days before time t (including the data at time t) as a sample. If the hard disk fails within k days after time t, the sample is defined as an abnormal sample, otherwise It is a normal sample.

Problem Description

Insert picture description here

Detection method

The training phase is carried out on Dtrn, passingMinimizing the loss function floss makes the modelYou can learn the distribution of normal samples in the sample space and the deeper potential space at the same time. After the model training is completed, it is evaluated on Dvrf, and when the evaluation is relatively successful, it is tested on Dtst. In the verification stage, the anomaly score A(X) defined based on the loss function floss is relatively small for normal samples, while for anomalous samples that have not appeared in the training phase, a larger A(X) will be obtained, and certain optimizations GuidelinesChoose a threshold Փ. In the test phase, the same method is used to calculate the A(X) of the sample, and the sample with A(X) ≥ Փ is judged as abnormal, and the sample with A(X) ˂ Փ is judged as normal, so as to achieve the purpose of abnormal detection.

Network structure

Overall network structure

The structure of the decoder is symmetrical to that of the encoder 1. The encoder 2 and the discriminator adopt the same structure as the encoder 1, but the respective parameters are independently learned during the training process. Encoder 1 and decoder form an autoencoder, which acts as a generalized generator (Generator, G) and discriminator to form a generative confrontation network GAN.
Insert picture description here

Encoder 1 (Enc1) network structure

Use 3-layer long short-term memory networkExtract the characteristics of samples in time series, Followed by 3 fully connected layers (Fully connected layer, FC)Extract latent feature vectors, The BatchNorm layer and ReLU activation function are used between the two fully connected layers to optimize the output distribution of the intermediate layer and improve training efficiency.
Insert picture description here
The sample X obtains the reconstruction X of the sample through the generator, namely X = fG(X), and generates the latent vector z during the first encoding, z = fEnc1(X), and z∈Rm, m is the dimension of z; X generates the reconstruction of latent vector through the secondary encoding of encoder 2, ẑ = fEnc2(X). During training, the two reconstruction errors are continuously reduced, and the distribution of the learning sample in the sample space X and the latent vector space z. due toThe reconstruction error of the sample layer is easily disturbed by noise, Affect the detection effect, so in the detection stage, the reconstruction error of the sample layer is no longer used as the basis for anomaly detection, but the deeperThe reconstruction error of the latent vector is used as the basis for anomaly detection, Which can greatly improve the model’sAnti-interference ability. The generative confrontation network adds confrontation learning to the model, and a better generator can be obtained through alternate training.

Model training verification and testing

Model training phase

Insert picture description here

Model verification phase

Insert picture description here

Model testing phase

Insert picture description here

Experimental part

Comparative Experiment

Insert picture description here
Insert picture description here
The LSTM-FC proposed in the paper showed the best performance, followed by LSTM-CNN. This is because the long short-term memory network can capture the contextual information of the sample for a long time, and the fully connected network is better than the fully connected network when extracting latent vectors. Convolutional neural networks can retain more information, so as to better learn the distribution of samples in high dimensions.

Susceptibility curve

Insert picture description here
The susceptibility curve of LSTM-FC is fuller, and the corresponding area under the curve is larger. Better results.

Anomaly score distribution

Insert picture description here
The anomaly score distribution of LSTM-FC is the clearest, which is conducive to the selection of the threshold, and the performance will be more stable in the test phase; the anomaly score distribution of GANomaly is the worst, which is also the area under the curve in the verification phase than LSTM-CNN but the test result The reason is not as good as the latter.

Guess you like

Origin blog.csdn.net/qq_16488989/article/details/109081761