Figure 3 takes you understand the extension KMP (EXKMP)

Figure 3 takes you understand the extension KMP (EXKMP)

Conventions:

For a string \ (S \) , we require \ (s [l, r] \) represents the character \ (s [l], s [l + 1], s [l + 2], \ dots, s [ r-1], s [r ] \) sequentially assembled into a string. subscript s 1 from the start . \ (s [1, I] (I \ in [1, n-]) \) represents \ ( S \) of a prefix, \ (S [I, n-] (I \ in [. 1, n-]) \) represents \ (S \) of a suffix

KMP solve the problem of expansion

And letter string defined substring s t, s length is n, the length t is m. Longest common prefix string t and the requirements of each string s suffix. Extended KMP algorithm may extend a determined array, wherein \ (extend [I] \) represents \ (T \) and \ (s [i, n] \) of the common prefix length. As s = "aaaabaa", t = "aaaaa", then extend = {4,3,2,1,0,2,1}

显然\(\forall j \in [1,extend[i]],t[1,j]=s[i,i+j-1]\)

KMP's expansion process

Seeking extend

Similar KMP algorithm next (fail) array, we define \ (nex [i] \) represents \ (T \) and \ (t [i, n] \) of the common prefix length. Suppose we have calculated the \ (t \) is \ (NEX \) , then how to find \ (extend \) it?

KMP KMP and expansion as a recursive process. Currently we calculated the \ ([1, i-1 ] \) of \ (Extend \) , we are asking \ (Extend [i] \) . We maintain \ (r \) represents \ (t \) furthest in \ (S \) on the mating to where, i.e. \ (R & lt = \ max_. 1} ^ {J = I (J + Extend [J] -1) \) , and the recording \ (R & lt \) takes its maximum value \ (P_0 \) . we consider \ (T \) from \ (P_0 \) where start of the match, easy to find \ (s [i] \) a position corresponding to the \ (t [i-p_0 + 1] \) we want to launch from the known \ (I \) matching state later, note \ (nex [i-p_0 + 1] \) may represent \ (I \) matching the length of the back, as shown below \ (t [1, nex [i -p_0 + 1]] = t [i-p_0 + 1, i-p_0 + nex [i-p_0 + 1]] \)

exkmp1.png

The \ (nex [i-p_0 + 1] \) size, two types discussed

(1) 当\(i+nex[i-p_0+1]<r\)

exkmp2.png

根据\(nex\)的定义,我们有\(t[1,nex[i-p_0+1]]=t[i-p_0+1,i-p_0+nex[i-p_0+1]]\),即图中两块蓝色部分相等。

又因为\(t\)\(p_0\)时能与\(s[p_0,n]\)匹配到\(r\)(绿色部分),所以\(t[i-p_0+1,i-p_0+nex[i-p_0+1]]=s[i,i+nex[i-p_0+1]-1]\)(蓝色部分等于上方绿色部分)。

所以\(t[1,nex[i-p_0+1]]=s[i,i+nex[i-p_0]-1]\),也就是说,\(extend[i]\)必然不小于\(nex[i-p_0+1]\).

考虑\(t[nex[i-p_0+1]+1]\)(黄色部分),若\(s[i+nex[i-p_0+1]]=t[nex[i-p_0+1]]\),则由于t一直匹配到r,且\(i+nex[i-p_0+1]<r\) , \(t[i-p_0+nex[i-p_0+1]+1]=t[nex[i-p_0+1]]\)。(黄色部分等于上面第1行绿色部分,上面第1行绿色部分又等于第2行绿色部分). 那么根据定义,nex可以再增加1,矛盾。因此\(extend[i]=nex[i-p_0+1]\)

(2) 当\(i+nex[i-p_0+1] \geq r\)

exkmp3.png

显然有\(s[i,r]=t[i-p_0+1,r-p_0+1]\),

根据nex的定义有\(t[i-p_0+1,r]=t[1,r-(i-p_0+1)+1]\)

那么\(s[i,r]=t[1,r]\),也就是说,\(extend[i]\)必然不小于\(nex[i-p_0+1]\).但是不像情况1,我们知道r后面的s与t的匹配情况。所以这里就从i开始向后暴力匹配。匹配完成后r必定会向右移动,更新p0,r.

void get_extend(char *s,int n,char *t,int m,int *extend){
    static int nex[maxn+5];
    get_nex(t,m,nex);
    extend[1]=0;
    while(s[extend[1]+1]==t[extend[1]+1]) extend[1]++;//暴力初始化extend[1]
    for(int i=2,p0=1,r=p0+extend[p0]-1;i<=n;i++){
        if(i+nex[i-p0+1]-1<r) extend[i]=nex[i-p0+1];//情况1
        else{
            //情况2
            extend[i]=max(r-i+1,0);//extend[i]>=r-i+1
            while(s[i+extend[i]]==t[1+extend[i]]) extend[i]++;
            p0=i;
            r=i+extend[i]-1;
        }
    }
}

求nex

类比KMP求nex数组是自己和自己匹配。EXKMP求nex也是t和t求一遍nex

void get_nex(char *t,int m,int *nex){
    nex[1]=m;//nex[1]表示t和t匹配,显然为m
    nex[2]=0;
    while(t[2+nex[2]]==t[1+nex[2]]) nex[2]++;//暴力初始化nex[2]
    for(int i=3,p0=2,r=p0+nex[p0]-1;i<=m;i++){
        if(i+nex[i-p0+1]-1<r) nex[i]=nex[i-p0+1];//情况1
        else{
            nex[i]=max(r-i+1,0);//情况2
            while(t[i+nex[i]]==t[1+nex[i]]) nex[i]++;//暴力匹配
            p0=i;
            r=i+nex[i]-1;
        }
    }
}

容易发现在i时需要的nex都是求过的,所以正确性不会有问题

扩展KMP的时间复杂度分析

求extend

In the case of (1), without making any matching can be calculated extend [i], in the case of (2), from each of the s r not match the start position, the matching had no longer match the position, i.e. for each position of the letter string, matched only once, the overall time complexity of the algorithm is \ (O (n) \) of

Seeking nex

And demand extend Similarly, a \ (O (m) \) of

In summary, the time complexity is \ (O (n + m) \)

Code LuoguP5410 [template] extend the KMP

//https://www.luogu.com.cn/problem/P5410 
#include<iostream>
#include<cstdio>
#include<cstring>
#define maxn 100000
using namespace std;
int n,m;
char s[maxn+5],t[maxn+5];
void get_nex(char *t,int m,int *nex){
    nex[1]=m;
    nex[2]=0;
    while(t[2+nex[2]]==t[1+nex[2]]) nex[2]++;
    for(int i=3,p0=2,r=p0+nex[p0]-1;i<=m;i++){
        if(i+nex[i-p0+1]-1<r) nex[i]=nex[i-p0+1];
        else{
            nex[i]=max(r-i+1,0);
            while(t[i+nex[i]]==t[1+nex[i]]) nex[i]++;
            p0=i;
            r=i+nex[i]-1;
        }
    }
}
void get_extend(char *s,int n,char *t,int m,int *extend){
    static int nex[maxn+5];
    get_nex(t,m,nex);
    for(int i=1;i<=m;i++) printf("%d ",nex[i]);
    printf("\n");
    extend[1]=0;
    while(s[extend[1]+1]==t[extend[1]+1]) extend[1]++;
    for(int i=2,p0=1,r=p0+extend[p0]-1;i<=n;i++){
        if(i+nex[i-p0+1]-1<r) extend[i]=nex[i-p0+1];
        else{
            extend[i]=max(r-i+1,0);
            while(s[i+extend[i]]==t[1+extend[i]]) extend[i]++;
            p0=i;
            r=i+extend[i]-1;
        }
    }
}

int f[maxn+5];
int main(){
//  freopen("input.txt","r",stdin);
    scanf("%s",s+1);
    scanf("%s",t+1);
    n=strlen(s+1);
    m=strlen(t+1);
    get_extend(s,n,t,m,f);
    for(int i=1;i<=n;i++) printf("%d ",f[i]);
}

Guess you like

Origin www.cnblogs.com/birchtree/p/12147312.html