keras--将loss作为模型的一个层

使用Keras编写复杂的loss时一般会将loss作为模型的一个层,则模型的输入包括原始输入和y_true,模型输出即为loss,例如yolov3keras版本的代码就是将loss作为模型的一层,因其计算loss比较复杂。下面介绍一个简单的例子,利用神经网络预测Iris数据集

需要的包

import keras.backend as K
from keras import layers, Model
import numpy as np
from keras.utils import plot_model, to_categorical

首先加载并划分数据集,Iris共有4个特征,最后一列为label(减一是为了从0开始),文末附上数据集

data = np.loadtxt('Iris.txt')
np.random.shuffle(data)
x_train, y_train = data[:100, :4], data[:100, 4]-1  # (100,4) (100,)
x_valid, y_valid = data[100:, :4], data[100:, 4]-1  # (50,4) (50,)

在这里插入图片描述
将label转换为onehot

y_train = to_categorical(y_train, num_classes=3) # (100,3)
y_valid = to_categorical(y_valid, num_classes=3) # (50,3)

构建预测网络,网络输出3为的向量,即3个类别的概率

def net():
    inputs = layers.Input(shape=[4, ], name='feature_input')
    dense1 = layers.Dense(32, activation='relu', name='dense1')(inputs)
    out = layers.Dense(3, activation='linear', name='out1')(dense1)
    return Model(inputs, out)

接下来编写自己的loss函数,传入的args[y_true, y_pred],这里实现了一个简单的交叉熵。

def my_loss(args, batch_size, C):
    """
    args: [y_true, y_pred], (batch_size, C)
    batch_size: 
    C: classes num.
    """
    y_true, y_pred = args[0], args[1]
    log_pred = K.log(y_pred)
    loss = -K.sum(y_true*log_pred, axis=1)
    batch_tensor = K.cast(batch_size, dtype=float)
    return loss/batch_tensor

构建训练模型,loss_body即为loss层,arguments是预先传入需要的参数,train_model为训练模型,输入包括两个,一个是原始输入一个是y_true,输出为loss,注意compile时lambda y_true, y_pred:的顺序别反了,否则模型模型跟踪不到梯度,会报错fit的时候y要喂入一个和x大小一样的0数组或其它的也行,反正它不起作用,但一定要喂入。保存模型时我们保留预测模型而不是训练模型,并且使用预测模型来预测

# build model
predict_body = net()
y_true = layers.Input(shape=[3, ])  # one hot label
loss_body = layers.Lambda(my_loss, output_shape=(1,), name='my_loss',
                          arguments={'batch_size': 16, 'C': 3})([y_true, predict_body.output])

train_model = Model(inputs=[predict_body.input, y_true], outputs=loss_body)
train_model.compile(optimizer='adam',
                    loss={'my_loss': lambda y_true, y_pred: y_pred})  # y_true, y_pred别反了, 因为模型送入的输入就是这样,否则跟踪不了梯度 
plot_model(train_model, show_shapes=True)
train_model.fit(x=[x_train, y_train], y=np.zeros(len(x_train)), batch_size=16, epochs=20,
                validation_data=([x_valid, y_valid], np.zeros(len(x_valid))))
predict_body.save_weights('predict.h5') # 保存预测模型

用plot_model画出train_model如下:
在这里插入图片描述
模型开始训练:
在这里插入图片描述
不过这样有个缺点,在训练过程中我们无法看到准确率,只能看到loss,因此可以做如下修改:

将模型变为多输出,输出层除了loss(out2)外再加上预测模型的输出(out1),预测模型的输出用来计算准确率不参与loss计算,因为keras会将两个loss相加,所以out1的loss直接返回0。metric计算out1的accuracy即可。由于算accuracy又要用到y_true,所以在fit的时候y传入[np.zeros(len(x_valid)), y_valid],前面的zeros用来传给loss,y_valid用来传给metric计算准确率。

import keras.backend as K
from keras import layers, Model
import numpy as np
from keras.utils import plot_model, to_categorical


def net():
    inputs = layers.Input(shape=[4, ], name='feature_input')
    dense1 = layers.Dense(32, activation='relu', name='dense1')(inputs)
    out = layers.Dense(3, activation='linear', name='out1')(dense1)
    return Model(inputs, out)


def my_loss(args, batch_size, C):
    """
    args: [y_true, y_pred], (batch_size, C)
    batch_size: 
    C: classes num.
    """
    y_true, y_pred = args[0], args[1]
    soft_pred = K.softmax(y_pred)
    log_pred = K.log(soft_pred+0.0000001)
    loss = -K.sum(y_true*log_pred, axis=1)
    batch_tensor = K.cast(batch_size, dtype=float)
    return loss/batch_tensor


data = np.loadtxt('Iris.txt')
np.random.shuffle(data)
x_train, y_train = data[:100, :4], data[:100, 4]-1  # (100,4) (100,)
x_valid, y_valid = data[100:, :4], data[100:, 4]-1  # (50,4) (50,)

y_train = to_categorical(y_train, num_classes=3)
y_valid = to_categorical(y_valid, num_classes=3)

# build model
predict_body = net()
y_true = layers.Input(shape=[3, ], name='y_true')  # one hot label
loss_body = layers.Lambda(my_loss, output_shape=(1,), name='out2',
                          arguments={'batch_size': 16, 'C': 3})([y_true, predict_body.output])

train_model = Model(inputs=[predict_body.input, y_true],
                    outputs=[loss_body, predict_body.output])
train_model.compile(optimizer='adam',
                    loss={'out2': lambda y_true, y_pred: y_pred,
                          'out1': lambda y_true, y_pred: K.constant(0)},
                    metrics={'out1': 'accuracy'})
train_model.fit(x=[x_train, y_train],
                y=[np.zeros(len(x_train)), y_train], batch_size=16, epochs=50,
                validation_data=[
                    [x_valid, y_valid],
                    [np.zeros(len(x_valid)), y_valid]])

训练过程就可以看到accuracy了
在这里插入图片描述
虽然这种写法不太优雅,但还是能看到metric了

Iris.txt
5.1 3.5 1.4 0.2 1
4.9 3.0 1.4 0.2 1
4.7 3.2 1.3 0.2 1
4.6 3.1 1.5 0.2 1
5.0 3.6 1.4 0.2 1
5.4 3.9 1.7 0.4 1
4.6 3.4 1.4 0.3 1
5.0 3.4 1.5 0.2 1
4.4 2.9 1.4 0.2 1
4.9 3.1 1.5 0.1 1
5.4 3.7 1.5 0.2 1
4.8 3.4 1.6 0.2 1
4.8 3.0 1.4 0.1 1
4.3 3.0 1.1 0.1 1
5.8 4.0 1.2 0.2 1
5.7 4.4 1.5 0.4 1
5.4 3.9 1.3 0.4 1
5.1 3.5 1.4 0.3 1
5.7 3.8 1.7 0.3 1
5.1 3.8 1.5 0.3 1
5.4 3.4 1.7 0.2 1
5.1 3.7 1.5 0.4 1
4.6 3.6 1.0 0.2 1
5.1 3.3 1.7 0.5 1
4.8 3.4 1.9 0.2 1
5.0 3.0 1.6 0.2 1
5.0 3.4 1.6 0.4 1
5.2 3.5 1.5 0.2 1
5.2 3.4 1.4 0.2 1
4.7 3.2 1.6 0.2 1
4.8 3.1 1.6 0.2 1
5.4 3.4 1.5 0.4 1
5.2 4.1 1.5 0.1 1
5.5 4.2 1.4 0.2 1
4.9 3.1 1.5 0.1 1
5.0 3.2 1.2 0.2 1
5.5 3.5 1.3 0.2 1
4.9 3.1 1.5 0.1 1
4.4 3.0 1.3 0.2 1
5.1 3.4 1.5 0.2 1
5.0 3.5 1.3 0.3 1
4.5 2.3 1.3 0.3 1
4.4 3.2 1.3 0.2 1
5.0 3.5 1.6 0.6 1
5.1 3.8 1.9 0.4 1
4.8 3.0 1.4 0.3 1
5.1 3.8 1.6 0.2 1
4.6 3.2 1.4 0.2 1
5.3 3.7 1.5 0.2 1
5.0 3.3 1.4 0.2 1
7.0 3.2 4.7 1.4 2
6.4 3.2 4.5 1.5 2
6.9 3.1 4.9 1.5 2
5.5 2.3 4.0 1.3 2
6.5 2.8 4.6 1.5 2
5.7 2.8 4.5 1.3 2
6.3 3.3 4.7 1.6 2
4.9 2.4 3.3 1.0 2
6.6 2.9 4.6 1.3 2
5.2 2.7 3.9 1.4 2
5.0 2.0 3.5 1.0 2
5.9 3.0 4.2 1.5 2
6.0 2.2 4.0 1.0 2
6.1 2.9 4.7 1.4 2
5.6 2.9 3.6 1.3 2
6.7 3.1 4.4 1.4 2
5.6 3.0 4.5 1.5 2
5.8 2.7 4.1 1.0 2
6.2 2.2 4.5 1.5 2
5.6 2.5 3.9 1.1 2
5.9 3.2 4.8 1.8 2
6.1 2.8 4.0 1.3 2
6.3 2.5 4.9 1.5 2
6.1 2.8 4.7 1.2 2
6.4 2.9 4.3 1.3 2
6.6 3.0 4.4 1.4 2
6.8 2.8 4.8 1.4 2
6.7 3.0 5.0 1.7 2
6.0 2.9 4.5 1.5 2
5.7 2.6 3.5 1.0 2
5.5 2.4 3.8 1.1 2
5.5 2.4 3.7 1.0 2
5.8 2.7 3.9 1.2 2
6.0 2.7 5.1 1.6 2
5.4 3.0 4.5 1.5 2
6.0 3.4 4.5 1.6 2
6.7 3.1 4.7 1.5 2
6.3 2.3 4.4 1.3 2
5.6 3.0 4.1 1.3 2
5.5 2.5 4.0 1.3 2
5.5 2.6 4.4 1.2 2
6.1 3.0 4.6 1.4 2
5.8 2.6 4.0 1.2 2
5.0 2.3 3.3 1.0 2
5.6 2.7 4.2 1.3 2
5.7 3.0 4.2 1.2 2
5.7 2.9 4.2 1.3 2
6.2 2.9 4.3 1.3 2
5.1 2.5 3.0 1.1 2
5.7 2.8 4.1 1.3 2
6.3 3.3 6.0 2.5 3
5.8 2.7 5.1 1.9 3
7.1 3.0 5.9 2.1 3
6.3 2.9 5.6 1.8 3
6.5 3.0 5.8 2.2 3
7.6 3.0 6.6 2.1 3
4.9 2.5 4.5 1.7 3
7.3 2.9 6.3 1.8 3
6.7 2.5 5.8 1.8 3
7.2 3.6 6.1 2.5 3
6.5 3.2 5.1 2.0 3
6.4 2.7 5.3 1.9 3
6.8 3.0 5.5 2.1 3
5.7 2.5 5.0 2.0 3
5.8 2.8 5.1 2.4 3
6.4 3.2 5.3 2.3 3
6.5 3.0 5.5 1.8 3
7.7 3.8 6.7 2.2 3
7.7 2.6 6.9 2.3 3
6.0 2.2 5.0 1.5 3
6.9 3.2 5.7 2.3 3
5.6 2.8 4.9 2.0 3
7.7 2.8 6.7 2.0 3
6.3 2.7 4.9 1.8 3
6.7 3.3 5.7 2.1 3
7.2 3.2 6.0 1.8 3
6.2 2.8 4.8 1.8 3
6.1 3.0 4.9 1.8 3
6.4 2.8 5.6 2.1 3
7.2 3.0 5.8 1.6 3
7.4 2.8 6.1 1.9 3
7.9 3.8 6.4 2.0 3
6.4 2.8 5.6 2.2 3
6.3 2.8 5.1 1.5 3
6.1 2.6 5.6 1.4 3
7.7 3.0 6.1 2.3 3
6.3 3.4 5.6 2.4 3
6.4 3.1 5.5 1.8 3
6.0 3.0 4.8 1.8 3
6.9 3.1 5.4 2.1 3
6.7 3.1 5.6 2.4 3
6.9 3.1 5.1 2.3 3
5.8 2.7 5.1 1.9 3
6.8 3.2 5.9 2.3 3
6.7 3.3 5.7 2.5 3
6.7 3.0 5.2 2.3 3
6.3 2.5 5.0 1.9 3
6.5 3.0 5.2 2.0 3
6.2 3.4 5.4 2.3 3
5.9 3.0 5.1 1.8 3
发布了83 篇原创文章 · 获赞 4 · 访问量 5342

猜你喜欢

转载自blog.csdn.net/weixin_43486780/article/details/105630308