详解旋转位置编码

物联网 2025-04-08 07:41:07 阅读次数: 0

首先通过一个具体的数字例子，详细说明如何使用旋转位置编码（Rotary Position Embedding, RoPE）。

1. 输入数据

假设：

输入张量 x：形状为 (batch_size=2, seq_len=3, num_heads=1, head_dim=4)，值为：
$\begin{bmatrix} [[1, 2, 3, 4]], \\ [[5, 6, 7, 8]], \\ [[9, 10, 11, 12]] \end{bmatrix}$
位置索引 t：序列长度为 3，位置索引为 [0, 1, 2]。
旋转频率 theta：设为 10000.0。

2. 预计算旋转编码 `pos_cis`

(1) 计算频率向量

向量维度 dim = 4，因此 dim // 2 = 2。
频率向量公式：
$\theta_j = \frac{1}{\text{base}^{j / d}}$
其中 $j = 0, 1$ ， $d = 4$ ，base = 10000.0。

计算：
$\theta_0 = \frac{1}{10000^{0 / 4}} = 1.0 \\ \theta_1 = \frac{1}{10000^{1 / 4}} \approx 0.5623$
因此，频率向量为：
$\text{freqs} = [1.0, 0.5623]$

(2) 计算旋转角度

位置索引 t = [0, 1, 2]。
旋转角度公式：
$\theta_j = t \cdot \theta_j$
计算：
$\cdot \text{freqs} = \begin{bmatrix} 0 \cdot 1.0 & 0 \cdot 0.5623 \\ 1 \cdot 1.0 & 1 \cdot 0.5623 \\ 2 \cdot 1.0 & 2 \cdot 0.5623 \end{bmatrix} = \begin{bmatrix} 0.0 & 0.0 \\ 1.0 & 0.5623 \\ 2.0 & 1.1246 \end{bmatrix}$

(3) 生成复数形式的旋转编码

使用 torch.polar 生成旋转编码：
$pos_cis = e i ⋅ m θ j \text{pos\_cis} = e^{i \cdot m \theta_j}$
计算：
$pos_cis = [ e i ⋅ 0.0 e i ⋅ 0.0 e i ⋅ 1.0 e i ⋅ 0.5623 e i ⋅ 2.0 e i ⋅ 1.1246 ] = [ 1.0 + i ⋅ 0.0 1.0 + i ⋅ 0.0 0.5403 + i ⋅ 0.8415 0.8472 + i ⋅ 0.5314 − 0.4161 + i ⋅ 0.9093 0.4284 + i ⋅ 0.9036 ] \text{pos\_cis} = \begin{bmatrix} e^{i \cdot 0.0} & e^{i \cdot 0.0} \\ e^{i \cdot 1.0} & e^{i \cdot 0.5623} \\ e^{i \cdot 2.0} & e^{i \cdot 1.1246} \end{bmatrix} = \begin{bmatrix} 1.0 + i \cdot 0.0 & 1.0 + i \cdot 0.0 \\ 0.5403 + i \cdot 0.8415 & 0.8472 + i \cdot 0.5314 \\ -0.4161 + i \cdot 0.9093 & 0.4284 + i \cdot 0.9036 \end{bmatrix}$

3. 应用旋转位置编码

(1) 将输入张量 `x` 转换为复数形式

x 的形状为 (2, 3, 1, 4)，将其转换为复数形式：
$x_q = \begin{bmatrix} [1 + i \cdot 2, 3 + i \cdot 4], \\ [5 + i \cdot 6, 7 + i \cdot 8], \\ [9 + i \cdot 10, 11 + i \cdot 12] \end{bmatrix}$

(2) 调整 `pos_cis` 的形状

pos_cis 的形状为 (3, 2)，调整为 (1, 3, 1, 2)，以便与 x 广播。

(3) 应用旋转编码

将 pos_cis 与 x_q 逐元素相乘：
$pos_cis x_q' = x_q \cdot \text{pos\_cis}$
计算：
$x_q' = \begin{bmatrix} (1 + i \cdot 2) \cdot (1.0 + i \cdot 0.0) & (3 + i \cdot 4) \cdot (1.0 + i \cdot 0.0) \\ (5 + i \cdot 6) \cdot (0.5403 + i \cdot 0.8415) & (7 + i \cdot 8) \cdot (0.8472 + i \cdot 0.5314) \\ (9 + i \cdot 10) \cdot (-0.4161 + i \cdot 0.9093) & (11 + i \cdot 12) \cdot (0.4284 + i \cdot 0.9036) \end{bmatrix}$
逐元素计算结果：
$x_q' = \begin{bmatrix} 1.0 + i \cdot 2.0 & 3.0 + i \cdot 4.0 \\ -2.6248 + i \cdot 7.3479 & -0.3776 + i \cdot 10.1706 \\ -13.5123 + i \cdot 2.0013 & -6.5604 + i \cdot 15.1404 \end{bmatrix}$

(4) 将复数形式转换回实数形式

将 x_q' 转换回实数形式，形状为 (2, 3, 1, 4)：
$x_q' = \begin{bmatrix} [[1.0, 2.0, 3.0, 4.0]], \\ [[-2.6248, 7.3479, -0.3776, 10.1706]], \\ [[-13.5123, 2.0013, -6.5604, 15.1404]] \end{bmatrix}$

4. 最终结果

应用旋转位置编码后，x 的值变为：
$\begin{bmatrix} [[1.0, 2.0, 3.0, 4.0]], \\ [[-2.6248, 7.3479, -0.3776, 10.1706]], \\ [[-13.5123, 2.0013, -6.5604, 15.1404]] \end{bmatrix}$

5. 总结

通过这个具体的数字例子，我们展示了旋转位置编码的完整计算过程：

预计算旋转编码 pos_cis。
将输入张量 x 转换为复数形式。
应用旋转编码，将位置信息融入 x。
将结果转换回实数形式。

旋转位置编码的核心思想是通过复数旋转将位置信息融入查询和键向量中，从而增强模型对序列位置的感知能力。

好的！以下是代码对应的数学公式和原理解释，帮助你更好地理解旋转位置编码（Rotary Position Embedding, RoPE）的实现。

1. 旋转位置编码的数学原理

旋转位置编码的核心思想是通过复数旋转将位置信息融入查询（Q）和键（K）向量中。具体来说，对于位置 m 和 n，查询和键向量会被旋转一个与位置相关的角度，从而在计算注意力分数时引入位置信息。

(1) 旋转公式

对于向量 x 和位置 m，旋转位置编码的公式为：
$\text{RoPE}(x, m) = x \cdot e^{i m \theta}$
其中：

$x$ 是输入向量。
$m$ 是位置索引。
$\theta$ 是旋转角度，由频率向量决定。

(2) 频率向量

频率向量 $\theta_j$ 的计算公式为：
$\theta_j = \frac{1}{\text{base}^{j / d}}$
其中：

$j$ 是维度索引。
$d$ 是向量的维度。
$\text{base}$ 是一个常数（代码中的 theta）。

2. 代码的数学公式

以下是代码中每一步对应的数学公式。

(1) `precompute_pos_cis` 函数

计算频率向量 $\theta_j$ ：
$\theta_j = \frac{1}{\text{base}^{j / d}}$
代码实现：
```
freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))
```
计算旋转角度 $\theta_j$ ：
$\theta_j = t \cdot \theta_j$
代码实现：
```
freqs = torch.outer(t, freqs).float()
```
生成复数形式的旋转编码：
$pos_cis = e i m θ j \text{pos\_cis} = e^{i m \theta_j}$
代码实现：
```
pos_cis = torch.polar(torch.ones_like(freqs), freqs)
```

(2) `apply_rotary_emb` 函数

将查询和键向量转换为复数形式：
$x_q = x_q + i x_q', \quad x_k = x_k + i x_k'$
代码实现：
```
xq_ = torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2))
xk_ = torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2))
```
应用旋转位置编码：
$x_q' = x_q \cdot e^{i m \theta_j}, \quad x_k' = x_k \cdot e^{i m \theta_j}$
代码实现：
```
xq_out = torch.view_as_real(xq_ * pos_cis).flatten(3)
xk_out = torch.view_as_real(xk_ * pos_cis).flatten(3)
```

3. 公式与代码的对应关系

数学公式	代码实现
$\theta_j = \frac{1}{\text{base}^{j / d}}$	`freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))`
$\theta_j = t \cdot \theta_j$	`freqs = torch.outer(t, freqs).float()`
$pos_cis = e i m θ j \text{pos\_cis} = e^{i m \theta_j}$	`pos_cis = torch.polar(torch.ones_like(freqs), freqs)`
$x_q = x_q + i x_q'$	`xq_ = torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2))`
$x_q' = x_q \cdot e^{i m \theta_j}$	`xq_out = torch.view_as_real(xq_ * pos_cis).flatten(3)`

4. 示例

假设：

输入向量 $x_q = [1, 2, 3, 4]$ 。
位置索引 $m = 1$ 。
频率向量 $\theta_j = [0.1, 0.2]$ 。

(1) 计算旋转角度

$\theta_j = [1 \cdot 0.1, 1 \cdot 0.2] = [0.1, 0.2]$

(2) 生成复数形式的旋转编码

$pos_cis = e i [ 0.1 , 0.2 ] = [ cos ⁡ ( 0.1 ) + i sin ⁡ ( 0.1 ) , cos ⁡ ( 0.2 ) + i sin ⁡ ( 0.2 ) ] \text{pos\_cis} = e^{i [0.1, 0.2]} = [\cos(0.1) + i \sin(0.1), \cos(0.2) + i \sin(0.2)]$

(3) 应用旋转位置编码

$x_q' = [1 + i 2, 3 + i 4] \cdot [\cos(0.1) + i \sin(0.1), \cos(0.2) + i \sin(0.2)]$

5. 总结

通过数学公式和代码的对应关系，可以看出旋转位置编码的核心是通过复数旋转将位置信息融入查询和键向量中。具体步骤包括：

计算频率向量 $\theta_j$ 。
计算旋转角度 $\theta_j$ 。
生成复数形式的旋转编码 $e^{i m \theta_j}$ 。
将旋转编码应用到查询和键向量上。

这种方法能够在不增加额外参数的情况下，显著提升模型对序列位置的感知能力。

猜你喜欢

转载自blog.csdn.net/qq_45889056/article/details/145976154

详解旋转位置编码

LLM - 旋转位置编码 RoPE 代码详解

旋转位置编码（Rotary Positional Embedding, RoPE）

一文通透位置编码：从标准位置编码、欧拉公式到旋转位置编码RoPE

positional encoding位置编码详解：绝对位置与相对位置编码对比

位置编码和token编码

Transformer架构：位置编码

Transformer：位置编码

位置编码Positional Encoding

transformer 位置编码

unity 关于位置移动旋转等

Transformer | DETR目标检测中的位置编码position_encoding代码详解

nlp-位置编码解析

NeRF位置编码代码解读

vision transformer的位置编码总结

AVL树的旋转详解

旋转卡壳详解

非旋转Treap详解

旋转卡壳详解（转）

旋转矩阵详解

Unity控制指针旋转到指定的位置

附魔位置详解

Arduino旋转编码器测试

使用Arduino 连接旋转编码器

旋转编码器正反转检测

根据点、旋转轴、旋转角度，计算点旋转之后的位置

AVL树的旋转操作详解

AVL树平衡旋转详解

Python编码声明的位置很重要

transformer中的positional encoding(位置编码)

今日推荐

Electron中的关于静态资源加载问题解决方案

《Cursor-AI编程》基础篇-界面指南

《Cursor-AI编程》基础篇-Tab代码智能补充

《Cursor-AI编程》基础篇-Composer功能详解

《Cursor-AI编程》基础篇-Chat功能详解

《Cursor-AI编程》进阶篇-自定义模型

《Cursor-AI编程》进阶篇-上下文详解

【大模型系列篇】最强检索增强技术GraphRAG基本原理详解

【大模型系列篇】基于Ollama和GraphRAG v2.0.0快速构建知识图谱

解释什么是迁移学习？在 CNN 中如何应用？（面试题200合集，高频、关键）

解释数据增强（Data Augmentation）的概念和方法（（面试题200合集，高频、关键））

揭秘大模型“魔法”：Function Calling 让 AI 不止会说，更能“做”！

周排行

ConfigurationClassParser类的parse方法源码解析

基础大讲堂-java 位运算符

ConsecutiveInteger判断给定的整数n能否表示成连续的m(m>1)个正整数之和

多项式问题之六——多项式快速幂

Spring Security技术栈开发企业级认证与授权（四）RESTful API服务异常处理

Linux基础命令---apachectl

MATLAB中的线性插值

Unity编辑器拓展之十七：NGUI ComponentSelector增加搜索框

SqlServer 备份还原教程

[Unity动画]01.

每日归档

2025-04-12(10529)

2025-04-11(9561)

2025-04-10(1213)

2025-04-09(10354)

2025-04-08(12998)

2025-04-07(0)

2025-04-06(0)

2025-04-05(0)

2025-04-04(0)

2025-04-03(0)