Attention原理+向量内积+Transformer中的Scaled Dot-Product Attention

编程语言 2023-07-25 17:52:41 阅读次数: 0

一、Attention原理

在这里插入图片描述

将 $S o u rce$ 中的构成元素想象成是由一系列的 $< Key, Va l u e >$ 数据对构成，此时给定 $T a r g e t$ 中的某个元素 $Q u ery$ ，通过计算 $Q u ery$ 和各个 $Key$ 的相似性或者相关性，得到每个 $Key$ 对应 $Va l u e$ 的权重系数，然后对 $Va l u e$ 进行加权求和，即得到了最终的 $A tt e n t i o n$ 数值。所以本质上 $A tt e n t i o n$ 机制是对 $S o u rce$ 中元素的 $Va l u e$ 值进行加权求和，而 $Q u ery$ 和 $Key$ 用来计算对应 $Va l u e$ 的权重系数。即可以将其本质思想改写为如下公式：

$Attention(Query,Source)=\sum_{i=1}^{L_{x}}Similarity(Query,Key_{i})*Value_{i}$

二、向量内积

向量内积又叫向量点乘，公式如下：

$\vec{a}\cdot \vec{c}=\parallel\vec{a}\parallel\times \parallel\vec{c}\parallel \times cos \theta$

在这里插入图片描述

向量内积的求导公式如下：

$\frac{\partial(\bar{x}\cdot \bar{w})}{\partial \bar{w}}=\bar{x}^{T}$

三、Transformer中的Scaled Dot-Product Attention

公式如下：

$Attention(Q,K,V)=softmax(\frac{QK^{T}}{\sqrt{d_{k}}})V$

对一组key-value对和n个query,可以使用两次矩阵乘法，并行的计算里面的每个元素。

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/python_plus/article/details/130750293

Attention原理+向量内积+Transformer中的Scaled Dot-Product Attention

attention

Transformer中的Attention

【attention|Tensorformer】从attention走向Transformer

Attention原理

Transformer中的Self-Attention

Soft Attention and Hard Attention

Axial Attention 轴向attention

attention与self attention的区别

Attention与Self-Attention

从attention到Transformer+CV中的self-attention

attention,self-attention,multihead attention,Transformer【亟待解决】

Self-Attention与Transformer

Vision Transformer with Deformable Attention

Attention 和 Transformer

Self-attention & Transformer

Attention Is All You Need（Transformer）原理小结

Attention Mechanism Bahdanau attention vs Luong attention

Attention机制（Bahdanau attention & Luong Attention）

Attention is all you need中Transformer方法

Transformer中Multi-head Attention的作用

对Transformer中self-attention的理解

Transformer中self-attention实现

Attention Points

attention机制

Attention模型

Attention Model

ATTENTION MECHANISM

Attention in CV

Attention总结

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)