Language model form
The language model introduced above has achieved very good results, but it is slow in the production environment. Therefore, the purpose of this article is to experiment with a lighter-weight language model. The second is the downstream task of the language model. In the application, the language model is essentially to discover the potential associations in the language grammar. The application of this method in theory will be very helpful in the task of judging the semantics of the text.
Model application VAE
Shown here is that VAE is added in the text generation process. lm_loss is used to assist text generation. Some things about VAE will not be expanded here. I have time to update it separately.
The code of the loss part : (lm_loss is a new loss added on the original basis)
xent_loss = K.sum(K.sparse_categorical_crossentropy(input_sentence, output), 1)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
lm_loss_layer = LM_loss_layer(word_dic, inpit_pad,name="loss_lm")
lm_loss_layer.trainable = False
lm_loss = lm_loss_layer(output)
vae_loss = K.mean(xent_loss + kl_loss + lm_loss)
The key code of lm_loss custom layer
def call(self, x, mask=None):
data_shape = K.int_shape(x)
pad_shape = K.int_shape(self.pad_data)
word_len = data_shape[1] # 词序列的长度
pad_len = pad_shape[1]
pad_data = K.cast(self.pad_data, tf.int64)
pad_data_onehot = K.one_hot(indices=pad_data, num_classes=data_shape[-1])
lm_input = K.concatenate((pad_data_onehot, x), axis=1)
lm_out = self.lm_model(lm_input)
class_num = K.int_shape(lm_out)[-1]
lm_out = K.reshape(x=lm_out, shape=(-1, word_len + pad_len, class_num))
lm_out = K.max(lm_out, axis=-1)
res = -K.log(lm_out)
res = K.sum(res)
return res
The results of the experiment are relatively demo: The
computer is dying of heat, so if you have a chance to post an effect, it will not be too good in theory because the foundation is still a VAE model.
If you have experimental friends, you can also add weight to loss.
The effective prerequisite is to have a good LM, otherwise the loss will decrease and it will be uncomfortable.
Model application NER
There is also a small demo to apply the language model to the task of NER. I found that the effect was good in the previous test of bert, but the production speed is relatively slow, so I want to replace it with a portable LM, and the LM can be customized according to needs. Define training. Paste some code here.
def build_model(self):
inpute_ = layers.Input((self.max_sentence_len,))
#lm embeding
inpit_pad = layers.Input(shape=(self.ngram,))
lm_embeding_layer = LM_embeding_layer()
emb_lm = lm_embeding_layer([inpute_,inpit_pad])
emb = layers.Embedding(input_dim=self.word_num, output_dim=128)(inpute_)
embedding_layer = layers.Concatenate(axis=-1)([emb, emb_lm])
model1_in = layers.Conv1D(filters=self.CONV_SIZE, kernel_size=2, activation="relu", padding="same")(
embedding_layer)
model1_in = layers.MaxPooling1D(pool_size=2, strides=1, padding="same")(model1_in)
model2_in = layers.Conv1D(self.CONV_SIZE, kernel_size=4, activation="relu", padding="same")(embedding_layer)
model2_in = layers.MaxPooling1D(pool_size=2, strides=1, padding="same")(model2_in)
model3_in = layers.Conv1D(self.CONV_SIZE, kernel_size=6, activation="relu", padding="same")(embedding_layer)
model3_in = layers.AveragePooling1D(pool_size=2, strides=1, padding="same")(model3_in)
merged = layers.concatenate([model1_in, model2_in, model3_in, embedding_layer], axis=-1) # merge
crf = CRF(self.class_num, sparse_target=False)
crf_res = crf(merged)
model = Model([inpute_, inpit_pad], crf_res)
adam = Adam(lr=0.001)
model.compile(optimizer=adam, loss=crf.loss_function, metrics=[crf.accuracy])
print(model.summary())
return model