Definition: A string is a finite sequence of zero or more characters.
The storage structure of the string is the same as that of the linear table, which is divided into sequential storage structure and chain storage structure. The sequential storage structure of a string uses a set of storage units with consecutive addresses to store the sequence of characters in the string.
(1) BF algorithm
There are two strings S (main string) and T (string) of length N and M. First compare S[1] and T[1] , if they are equal, then compare S[2] and T[2] until T[M] ; if S[1] and T[1] are not equal, then T moves the position of one character to the right, and then compares in turn. The algorithm needs to perform M*(N-M+1) comparisons in the worst case , and the time complexity is O(M*N) .
(2) KMP algorithm
next array definition:
Partial matching values of the KMP algorithm:
The prefix and suffix of "a" are empty sets, and the length of the common elements is 0;
The prefix of "ab" is [a], the suffix is [b], and the length of the common elements is 0;
The prefix of "aba" is [a, ab], the suffix is [a, ba], and the length of the common elements is 1;
The prefix of "abab" is [a, ab, aba], the suffix is [b, ab, bab], and the length of the common elements is 2;
The prefix of "ababa" is [a, ab, aba, abab], the suffix is [a, ba, aba, baba], the common element is "aba", and the length is 3;
1) When j=1, the fix is next[1]=0;
2) When j=2, the string from 1 to j-1 is "a", which belongs to other cases, and it is fixed as next[2]=1;
3) When j=3, the string from 1 to j-1 is "ab", the prefix character "a" is not equal to the suffix character "b", it belongs to other cases, so, next[3]=1;
4) When j=4, the string from 1 to j-1 is "aba", the prefix character "a" is equal to the suffix character "a", that is, one character is equal, so, next[4]= 1+1=2;
5) When j=5, the string from 1 to j-1 is "abab", the prefix character "ab" is equal to the suffix character "ab", that is, there are 2 characters equal, so, next[5]= 2+1=3;
6) When j=6, the string from 1 to j-1 is "ababa", the prefix character "aba" is equal to the suffix character "aba", that is, 3 characters are equal, so, next[6]= 3+1=4;
7) When j=7, the string from 1 to j-1 is "ababaa", the prefix character "a" is equal to the suffix character "a", that is, there is 1 character equal, so, next[7]= 1+1=2;
8) When j=8, the string from 1 to j-1 is "ababaaa", the prefix character "a" is equal to the suffix character "a", that is, there is 1 character equal, so, next[8]= 1+1=2;
9) When j=9, the string from 1 to j-1 is "ababaaab", the prefix character "ab" is equal to the suffix character "ab", that is, there are 2 characters equal, so, next[9]= 2+1=3;
#include <stdio.h>
typedef char* String;
void get_next( String T, int *next )
{
int j = 0;
int i = 1;
next[1] = 0;
while( i < T[0] )
{
if( 0 == j || T[i] == T[j] )
{
i++;
j++;
next[i] = j;
}
else
{
j = next[j];
}
}
}
// 返回子串T在主串S第pos个字符之后的位置
// 若不存在,则返回0
int Index_KMP( String S, String T, int pos )
{
int i = pos;
int j = 1;
int next[255];
get_next( T, next );
while( i <= S[0] && j <= T[0] )
{
if( 0 == j || S[i] == T[j] )
{
i++;
j++;
}
else
{
j = next[j];
}
}
if( j > T[0] ) return i - T[0];
else return 0;
}