KMP算法模板及理解

Number Sequence

Given two sequences of numbers : a[1], a[2], … , a[N], and b[1], b[2], … , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], … , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.
Input
The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], … , a[N]. The third line contains M integers which indicate b[1], b[2], … , b[M]. All integers are in the range of [-1000000, 1000000].
Output
For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.
Sample Input
2
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 1 3
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 2 1
Sample Output
6
-1

原文翻译

给定两个数字序列:一个[1]，一个[2]，……， a[N]，和b[1]， b[2]，， b[M] (1 <= M <= 10000, 1 <= N <= 1000000)你的任务是找到一个数字K，使a[K] = b[1]， a[K + 1] = b[2]，……， a[K + M - 1] = b[M]。如果有超过一个K，输出最小的那个。

输入

输入的第一行是一个数字T，表示大小写。每个case包含三行。第一行是两个数字N和M (1 <= M <= 10000, 1 <= N <= 1000000)。第二行包含N个整数，表示一个[1]，一个[2]，……[N]。第三行是M个整数，表示b[1]， b[2]，……,b [M]。所有整数都在[-1000000,1000000]的范围内。

输出

对于每个测试用例，您应该输出一行只包含上面描述的K。如果不存在这样的K，则输出-1。

样例输入

13 5

1 2 1 2 3 1 2 3 1 3 2 2

1 2 3 3

13 5

1 2 1 2 3 1 2 3 1 3 2 2

1 2 3 2 1

样例输出

题目大意

这是一道KMP的板子题。其实很好理解。就是让找字符串 a 中
是否存在与字符串 b 完全相同的字符串片段。也可以理解为 a 中是否存在 b 。

解题思路

可以用最简单的幼稚算法。以 a 为参照，将字符串 b 一个一个的和 a 对照。如果不一样的话那么就让 b 从第一个开始，再把 a 往后移一位继续一个一个对照。但是这样很麻烦，并且很慢，很容易 T 。所以就要用KMP算法来优化运算。

KMP算法理解

在这里插入图片描述

第一个图里就是先逐个对比，当发现不匹配的字符之后，用KMP算法可找到最长公共前后缀“AB”。可不动指针位置，移动前缀至后缀位置即可。

为什么可以这么移动？就是因为在前缀和后缀之间不会存在完全配对的情况，所以可以直接跳过节省时间，节省运算。这是KMP的核心部分。
在这里插入图片描述
值得注意的是，前后缀的公共部分可以交叉。如上图， “ABABA” 的最长公共前后缀为 “ABA” 。

接下来我们可以考虑如何确定这个指针的位置。通过下图分析可知，需要被匹配的字符串可以通过自身来确定位置。
在这里插入图片描述
化成表格可得：

而这个就是kmp算法中的关键部分 —— next 数组

代码实现

以本篇例题为例

#include<iostream>
#include<stdio.h>
#include<string.h>
using namespace std;
const int maxn=1e6+50;
int n,m;
int a[maxn];
int b[maxn];
int nextt[maxn];
void init()  //这个函数建立next数组
{
	int i=0,j=-1;
	nextt[0]=-1;
	while(i<m)
	{
		if(j==-1||b[i]==b[j])  //存在公共前后缀
		{
			i++;
			j++;
			nextt[i]=j;  //读入next数组
			// 这一步算的是对于前 i 个字符，最长公共前后缀的长度。
		}
		else  // 出现不匹配的情况
		{
			j=nextt[j];
		}
//		printf("%d %d    ",i,j);
	}
//	for(int i=0;i<m;i++)
//        printf("%d ",nextt[i]);
//    printf("\n");
}
void kmp()
{
	int i=0,j=0;
	int flag=0;
	init();
	while(i<n)
	{
		if(j==-1||a[i]==b[j]) // 对比配对
		{
			j++;
			i++;
		}
		else j=nextt[j];  //出现不配对，把前缀移动至后缀位置。
		if(j==m)
		{
			flag=1;
			printf("%d\n",i-m+1);
			break;
		}
	}
	if(flag==0)printf("-1\n");
}
int main()
{
	int ttt;
	scanf("%d",&ttt);
	while(ttt--)
	{
		scanf("%d%d",&n,&m);
		for(int i=0;i<n;i++)
			scanf("%d",&a[i]);
		for(int j=0;j<m;j++)
			scanf("%d",&b[j]);
		kmp();
	}
	return 0;
}

总结

KMP算法思想比较好理解一些，但是需要好好看代码。这个代码理解起来确实有点难。next数组的运用比较灵活，如果理解并能灵活运用next数组的话，KMP算法就没太大的难点了。
多做一些练习即可。

KMP视频讲解链接

KMP算法动态讲解（不含代码）