Talk about the application of language model again

Language model form

The language model introduced above has achieved very good results, but it is slow in the production environment. Therefore, the purpose of this article is to experiment with a lighter-weight language model. The second is the downstream task of the language model. In the application, the language model is essentially to discover the potential associations in the language grammar. The application of this method in theory will be very helpful in the task of judging the semantics of the text.

Model application VAE

Shown here is that VAE is added in the text generation process. lm_loss is used to assist text generation. Some things about VAE will not be expanded here. I have time to update it separately.

The code of the loss part : (lm_loss is a new loss added on the original basis)

xent_loss = K.sum(K.sparse_categorical_crossentropy(input_sentence, output), 1)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)

lm_loss_layer = LM_loss_layer(word_dic, inpit_pad,name="loss_lm")
lm_loss_layer.trainable = False
lm_loss = lm_loss_layer(output)

vae_loss = K.mean(xent_loss + kl_loss + lm_loss)

The key code of lm_loss custom layer

    def call(self, x, mask=None):
     
        data_shape = K.int_shape(x)
        pad_shape = K.int_shape(self.pad_data)
        word_len = data_shape[1]  # 词序列的长度
        pad_len = pad_shape[1]

        pad_data = K.cast(self.pad_data, tf.int64)
        pad_data_onehot = K.one_hot(indices=pad_data, num_classes=data_shape[-1])

        lm_input = K.concatenate((pad_data_onehot, x), axis=1)

        lm_out = self.lm_model(lm_input)
        class_num = K.int_shape(lm_out)[-1]

        lm_out = K.reshape(x=lm_out, shape=(-1, word_len + pad_len, class_num))

        lm_out = K.max(lm_out, axis=-1) 

        res = -K.log(lm_out)

        res = K.sum(res)

        return res

The results of the experiment are relatively demo: The
computer is dying of heat, so if you have a chance to post an effect, it will not be too good in theory because the foundation is still a VAE model.
If you have experimental friends, you can also add weight to loss.
The effective prerequisite is to have a good LM, otherwise the loss will decrease and it will be uncomfortable.

Model application NER

There is also a small demo to apply the language model to the task of NER. I found that the effect was good in the previous test of bert, but the production speed is relatively slow, so I want to replace it with a portable LM, and the LM can be customized according to needs. Define training. Paste some code here.

    def build_model(self):
        inpute_ = layers.Input((self.max_sentence_len,))
        #lm  embeding
        inpit_pad = layers.Input(shape=(self.ngram,))
        lm_embeding_layer = LM_embeding_layer()
        
        emb_lm = lm_embeding_layer([inpute_,inpit_pad])

        emb = layers.Embedding(input_dim=self.word_num, output_dim=128)(inpute_)  
        embedding_layer = layers.Concatenate(axis=-1)([emb, emb_lm])

        model1_in = layers.Conv1D(filters=self.CONV_SIZE, kernel_size=2, activation="relu", padding="same")(
            embedding_layer)
        model1_in = layers.MaxPooling1D(pool_size=2, strides=1, padding="same")(model1_in)

        model2_in = layers.Conv1D(self.CONV_SIZE, kernel_size=4, activation="relu", padding="same")(embedding_layer)
        model2_in = layers.MaxPooling1D(pool_size=2, strides=1, padding="same")(model2_in)

        model3_in = layers.Conv1D(self.CONV_SIZE, kernel_size=6, activation="relu", padding="same")(embedding_layer)
        model3_in = layers.AveragePooling1D(pool_size=2, strides=1, padding="same")(model3_in)

        merged = layers.concatenate([model1_in, model2_in, model3_in, embedding_layer], axis=-1)  # merge
        crf = CRF(self.class_num, sparse_target=False)
        crf_res = crf(merged)
        model = Model([inpute_, inpit_pad], crf_res)

        adam = Adam(lr=0.001)
        model.compile(optimizer=adam, loss=crf.loss_function, metrics=[crf.accuracy])
        print(model.summary())
        return model

Guess you like

Origin blog.csdn.net/cyinfi/article/details/96639254