RoPE(Rotary Positional Encoding)完整数学表达式

一、基本形式(二维旋转)

对 embedding 向量 x∈Rd\boldsymbol{x} \in \mathbb{R}^dxRd,把每两维分成一组:
(x2i, x2i+1)(x_{2i},\,x_{2i+1})(x2i,x2i+1)

对第 pos\text{pos}pos 个位置:
[x2i′x2i+1′][cos⁡θi,pos−sin⁡θi,possin⁡θi,poscos⁡θi,pos][x2ix2i+1] \begin{bmatrix} x'_{2i} \\ x'_{2i+1} \end{bmatrix} \begin{bmatrix} \cos\theta_{i,\text{pos}} & -\sin\theta_{i,\text{pos}} \\ \sin\theta_{i,\text{pos}} & \cos\theta_{i,\text{pos}} \end{bmatrix} \begin{bmatrix} x_{2i} \\ x_{2i+1} \end{bmatrix} [x2ix2i+1][cosθi,possinθi,possinθi,poscosθi,pos][x2ix2i+1]

二、角度定义(频率设计)

θi,pos=pos⋅ωi \theta_{i,\text{pos}} = \text{pos} \cdot \omega_i θi,pos=posωi

其中:
ωi=1100002i/d \omega_i = \frac{1}{10000^{2i/d}} ωi=100002i/d1

因此:
θi,pos=pos100002i/d \theta_{i,\text{pos}} = \frac{\text{pos}}{10000^{2i/d}} θi,pos=100002i/dpos

三、展开写法(工程常用)

x2i′=x2icos⁡θi,pos−x2i+1sin⁡θi,posx2i+1′=x2isin⁡θi,pos+x2i+1cos⁡θi,pos \begin{aligned} x'_{2i} &= x_{2i}\cos\theta_{i,\text{pos}} - x_{2i+1}\sin\theta_{i,\text{pos}} \\ x'_{2i+1} &= x_{2i}\sin\theta_{i,\text{pos}} + x_{2i+1}\cos\theta_{i,\text{pos}} \end{aligned} x2ix2i+1=x2icosθi,posx2i+1sinθi,pos=x2isinθi,pos+x2i+1cosθi,pos

四、作用在 Q / K 上

RoPE 实际是作用在:
Q′=RoPE(Q,pos)K′=RoPE(K,pos) \begin{aligned} Q' &= \text{RoPE}(Q,\text{pos}) \\ K' &= \text{RoPE}(K,\text{pos}) \end{aligned} QK=RoPE(Q,pos)=RoPE(K,pos)

Attention 计算:
Attn=Q′(K′)T \text{Attn} = Q'(K')^T Attn=Q(K)T

五、核心性质

旋转满足:
⟨Rθiq,  Rθjk⟩=⟨q,  Rθj−θik⟩ \langle R_{\theta_i}q,\;R_{\theta_j}k\rangle = \langle q,\;R_{\theta_j-\theta_i}k\rangle Rθiq,Rθjk=q,Rθjθik

结论:attention 只依赖位置差 posj−posi\text{pos}_j - \text{pos}_iposjposi

六、复数形式(更本质)

把每两维写成复数:
zi=x2i+ix2i+1 z_i = x_{2i} + i x_{2i+1} zi=x2i+ix2i+1

RoPE:
zi′=zi⋅eiθi,pos z'_i = z_i \cdot e^{i\theta_{i,\text{pos}}} zi=zieiθi,pos

Attention 内积:
zi(posi)⋅zj(posj)‾=zizj‾⋅ei(θi,posi−θi,posj) z_i(\text{pos}_i) \cdot \overline{z_j(\text{pos}_j)} = z_i \overline{z_j} \cdot e^{i(\theta_{i,\text{pos}_i}-\theta_{i,\text{pos}_j})} zi(posi)zj(posj)=zizjei(θi,posiθi,posj)

七、向量整体写法

RoPE(x,pos)=x⊙cos⁡(Θpos)+rotate(x)⊙sin⁡(Θpos) \text{RoPE}(x,\text{pos}) = x \odot \cos(\Theta_{\text{pos}}) + \text{rotate}(x) \odot \sin(\Theta_{\text{pos}}) RoPE(x,pos)=xcos(Θpos)+rotate(x)sin(Θpos)

其中 rotate(x)\text{rotate}(x)rotate(x):把每对 (x2i,x2i+1)(x_{2i},x_{2i+1})(x2i,x2i+1) 变成 (−x2i+1,x2i)(-x_{2i+1},x_{2i})(x2i+1,x2i)

八、工程实现(PyTorch)

def rope(x, sin, cos):
    x1 = x[..., ::2]
    x2 = x[..., 1::2]
    return torch.cat([
        x1 * cos - x2 * sin,
        x1 * sin + x2 * cos
    ], dim=-1)

在这里插入图片描述

Logo

汇聚全球AI编程工具,助力开发者即刻编程。

更多推荐