如图所示,左图为p(x)和q(x)的分布(最简单的情况,两个的函数基本一样,只是q(x)的均值在移动),p(x)固定不动,移动q(x),和q(x)越近KL散度越小,反之亦然
import torch.nn as nn
import torch
import torch.nn.functional as F
if __name__ == '__main__':
x_o = torch.Tensor([[1, 2], [3, 4]])
y_o = torch.Tensor([[0.1, 0.2], [0.3, 0.4]])
x = F.log_softmax(x_o, dim=-1)
y = F.softmax(y_o, dim=-1)
criterion = nn.KLDivLoss()
klloss = criterion(x, y)
klloss_same = criterion(x, x)
print('klloss', klloss)
print('klloss_same', klloss_same)
kl = F.kl_div(x, y, reduction='sum')
print('kl', kl)
kl2 = F.kl_div(x, y, reduction='mean')
print('kl2', kl2)
klloss tensor(0.0482) klloss_same tensor(0.) kl tensor(0.1928) kl2 tensor(0.0482)
参考资料