kmp算法模板及题型总结

对于kmp的理解:https://blog.csdn.net/v_july_v/article/details/7041827


next数组模板:

void GetNext(char* p,int next[])  
{  
    int pLen = strlen(p);  
    next[0] = -1;  
    int k = -1;  
    int j = 0;  
    while (j < pLen - 1)  
    {  
        //p[k]表示前缀,p[j]表示后缀  
        if (k == -1 || p[j] == p[k])   
        {  
            ++k;  
            ++j;  
            next[j] = k;  
        }  
        else   
        {  
            k = next[k];  
        }  
    }  
}  

优化后的next数组模板:

void GetNextval(char* p, int next[])  
{  
    int pLen = strlen(p);  
    next[0] = -1;  
    int k = -1;  
    int j = 0;  
    while (j < pLen - 1)  
    {  
        //p[k]表示前缀,p[j]表示后缀    
        if (k == -1 || p[j] == p[k])  
        {  
            ++j;  
            ++k;  
            //较之前next数组求法,改动在下面4行  
            if (p[j] != p[k])  
                next[j] = k;   //之前只有这一行  
            else  
                //因为不能出现p[j] = p[ next[j ]],所以当出现时需要继续递归,k = next[k] = next[next[k]]  
                next[j] = next[k];  
        }  
        else  
        {  
            k = next[k];  
        }  
    }  
} 

kmp模板:


int KmpSearch(char* s, char* p)  
{  
    int i = 0;  
    int j = 0;  
    int sLen = strlen(s);  
    int pLen = strlen(p);  
    while (i < sLen && j < pLen)  
    {  
        //①如果j = -1,或者当前字符匹配成功(即S[i] == P[j]),都令i++,j++      
        if (j == -1 || s[i] == p[j])  
        {  
            i++;  
            j++;  
        }  
        else  
        {  
            //②如果j != -1,且当前字符匹配失败(即S[i] != P[j]),则令 i 不变,j = next[j]      
            //next[j]即为j所对应的next值        
            j = next[j];  
        }  
    }  
    if (j == pLen)  
        return i - j;  
    else  
        return -1;  
}  


KMP最小循环节、循环周期:

定理:假设S的长度为len,则S存在最小循环节,循环节的长度L为len-next[len],子串为S[0…len-next[len]-1]。

(1)如果len可以被len - next[len]整除,则表明字符串S可以完全由循环节循环组成,循环周期T=len/L。

(2)如果不能,说明还需要再添加几个字母才能补全。需要补的个数是循环个数L-len%L=L-(len-L)%L=L-next[len]%L,L=len-next[len]。


注意:这里的next数组模板要改成while( j < plen)因为要用到整个字符串的next值。


题型:

一.裸题:

                                                                           NumberSequence

Given two sequences of numbers : a[1], a[2], ...... ,a[N], and b[1], b[2], ...... , b[M] (1 <= M <= 10000, 1 <= N <=1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] =b[2], ...... , a[K + M - 1] = b[M]. If there are more than one K exist, outputthe smallest one. 

Input

The first line of input is a number T which indicate thenumber of cases. Each case contains three lines. The first line is two numbersN and M (1 <= M <= 10000, 1 <= N <= 1000000). The second linecontains N integers which indicate a[1], a[2], ...... , a[N]. The third linecontains M integers which indicate b[1], b[2], ...... , b[M]. All integers arein the range of [-1000000, 1000000]. 

Output

For each test case, you should output one line which onlycontain K described above. If no such K exists, output -1 instead. 

Sample Input

2

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 1 3

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 2 1

Sample Output

6

-1

题意:寻找字串的位置并输出,若没有就输出-1

代码:

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

int nex[10005];
int s[1000000];
int p[10005];
int m,n;

void GetNextval()
{
    int pLen = m;
    nex[0] = -1;
    int k = -1;
    int j = 0;
    while (j < pLen - 1)
    {
        if (k == -1 || p[j] == p[k])
        {
            ++j;
            ++k;
            if (p[j] != p[k])
                nex[j] = k;  
            else
                nex[j] = nex[k];
        }
        else
        {
            k = nex[k];
        }
    }
}


int KmpSearch()
{
    int i = 0;
    int j = 0;
    int sLen = n;
    int pLen = m;
    while (i < sLen && j < pLen)
    {
        if (j == -1 || s[i] == p[j])
        {
            i++;
            j++;
        }
        else
        {
            j = nex[j];
        }
    }
    if (j == pLen)
        return i - j+1;
    else
        return -1;
}


int main()
{
    int t;
    scanf("%d",&t);
    while(t--)
    {
        scanf("%d%d",&n,&m);
        for(int i = 0 ; i < n ; i ++)
            scanf("%d",&s[i]);
        for(int i = 0 ; i < m ; i ++)
            scanf("%d",&p[i]);
        GetNextval();
        printf("%d\n",KmpSearch());
    }
    return 0;
}



二:变形题:寻找字串的个数(可重复)

                                                                                Oulipo

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book: 

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais… 

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces. 

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap. 

InputThe first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format: 

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W). 
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000. 
OutputFor every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T. 

Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output
1
3
0


 题意:每组输入两个字符串,判断第一个字符串在第二个字符串中出现了几次,这个题里的next数组要计算最后一个,即while(j < plen),可以对比下一道题理解一下。

 代码:        

#include<iostream>
#include<cstdio>
#include<cstdlib>
#include<cstring>
#include<string>
#include<queue>
#include<algorithm>
#include<map>
#include<iomanip>
using namespace std;

char text[1000010];
char partten[10010];
int nex[10010];

void GetNextval(char *p)
{
    int pLen = strlen(p);
    nex[0] = -1;
    int k = -1;
    int j = 0;
    while (j < pLen)//因为这个题要计算共有多少个,所有需要知道最后一个字符的next值,所以这里的j要计算到最后一个数

    {
        if (k == -1 || p[j] == p[k])
        {
            ++j;
            ++k;
            if (p[j] != p[k])
                nex[j] = k;
            else
                nex[j] = nex[k];
        }
        else
        {
            k = nex[k];
        }
    }
}

int KmpSearch(char *s,char *p)
{

    int i = 0;
    int j = 0;
    int flag = 0;
    int sLen = strlen(s);
    int pLen = strlen(p);
    while (i < sLen && j <=pLen)
    {
        if (j == -1 || s[i] == p[j])
        {
            i++;
            j++;
        }
        else
        {
            j = nex[j];
        }
        if(j==pLen)
            flag++;
    }
    return flag;
}


int main()
{
    int t,num;
    scanf("%d",&t);
    while(t--)
    {
        cin >> partten >> text;
        GetNextval(partten);
        num = KmpSearch(text,partten);
        printf("%d\n",num);


    }
    return 0;
}

三:变形题:寻找子串的个数(一直往下走不可回溯)

                                           剪花布条

 

一块花布条,里面有些图案,另有一块直接可用的小饰条,里面也有一些图案。对于给定的花布条和小饰条,计算一下能从花布条中尽可能剪出几块小饰条来呢? 
Input输入中含有一些数据,分别是成对出现的花布条和小饰条,其布条都是用可见ASCII字符表示的,可见的ASCII字符有多少个,布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符,则不再进行工作。 
Output输出能从花纹布中剪出的最多小饰条个数,如果一块都没有,那就老老实实输出0,每个结果之间应换行。 
Sample Input
abcde a3
aaaaaa  aa
#
Sample Output
0
3


这个题的next数组不需要计算出最后一个,即while(j < plen - 1)即可。 


代码:

#include<iostream>
#include<cstdio>
#include<cstdlib>
#include<cstring>
#include<string>
#include<queue>
#include<algorithm>
#include<map>
#include<iomanip>
using namespace std;

char text[1000010];
char partten[10010];
int nex[10010];


void GetNextval(char *p)
{
    int pLen = strlen(p);
    nex[0] = -1;
    int k = -1;
    int j = 0;
    while (j < pLen -1)

    {
        if (k == -1 || p[j] == p[k])
        {
            ++j;
            ++k;
            if (p[j] != p[k])
                nex[j] = k;   
            else
                nex[j] = nex[k];
        }
        else
        {
            k = nex[k];
        }
    }
}


int KmpSearch(char *s,char *p)
{

    int i = 0;
    int j = 0;
    int flag = 0;
    int sLen = strlen(s);
    int pLen = strlen(p);
    while (i < sLen && j <=pLen)
    {
        if (j == -1 || s[i] == p[j])
        {
            i++;
            j++;
        }
        else
        {
            j = nex[j];
        }
        if(j==pLen)
            flag++;
    }
    return flag;
}


int main()
{
    int num;
    while(cin >> text && text[0]!='#')
    {
        cin >> partten;
        GetNextval(partten);
        num = KmpSearch(text,partten);
        printf("%d\n",num);


    }
    return 0;
}


四:变形题:字符串周期问题

                                                                 Cyclic Nacklace

CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of "HDU CakeMan", he wants to sell some little things to make money. Of course, this is not an easy task. 

As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl's fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls' lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet's cycle is 9 and its cyclic count is 2: 

Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden. 
CC is satisfied with his ideas and ask you for help.
InputThe first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases. 
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by 'a' ~'z' characters. The length of the string Len: ( 3 <= Len <= 100000 ).OutputFor each case, you are required to output the minimum count of pearls added to make a CharmBracelet.Sample Input
3
aaa
abca
abcde
Sample Output
0
2
5


题意:每个字母代表一种珍珠,要求补最少的珠子使珍珠串成为循环串

思路:转化为求最小循环字串即可,套用上面的公式:

KMP最小循环节、循环周期

定理:假设S的长度为len,则S存在最小循环节,循环节的长度L为len-next[len],子串为S[0…len-next[len]-1]。

(1)如果len可以被len - next[len]整除,则表明字符串S可以完全由循环节循环组成,循环周期T=len/L。

(2)如果不能,说明还需要再添加几个字母才能补全。需要补的个数是循环个数L-len%L=L-(len-L)%L=L-next[len]%L,L=len-next[len]。

注意:这里的next数组模板要改成while( j < plen)因为要用到整个字符串的next值。

代码:

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

char partten[100000+10];
int nex[100000+10];
int pLen;

void GetNext()
{
    nex[0] = -1;
    int k = -1;
    int j = 0;
    while (j < pLen )
    {
        //p[k]表示前缀,p[j]表示后缀
        if (k == -1 || partten[j] == partten[k])
        {
            ++k;
            ++j;
            nex[j] = k;
        }
        else
        {
            k = nex[k];
        }
    }
}

int main()
{
    int t;
    cin >> t;
    while(t--)
    {
        scanf("%s",partten);
        pLen = strlen(partten);
        GetNext();
        int circle_len = pLen - nex[pLen];//代表循环节的长度
        if(circle_len != pLen && pLen%circle_len==0)//如果可以多次循环
                printf("0\n");
        else
            printf("%d\n",circle_len - nex[pLen]%circle_len);//取余的作用:abcab,去掉abc
    }                     //循环节的长度减去已经匹配的长度
    return 0;
}



                                                                              

Period

 

For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A  K , that is A concatenated K times, for some string A. Of course, we also want to know the period K. 
InputThe input file consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S. The second line contains the string S. The input file ends with a line, having the number zero on it. 
OutputFor each test case, output “Test case #” and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case. 
Sample Input
3
aaa
12
aabaabaabaab
0
Sample Output
Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4


题意:给出一个字符串,找出由循环子字符串前缀,输出前缀长度及其中相同的子字符串数(即此前缀中循环子串的个数)

思路:这个相当于遍历每一个字母时都判断一下有没有循环字串,可以在求next数组的时候一起完成

代码:

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

char p[1000000+10];
int nex[1000000+10];

void Getnext()
{
    int plen = strlen(p),circle_len;
    int k = -1 ;
    int j = 0 ;//现在的字符 -
    nex[0] = -1;
    while( j < plen)
    {
        if(k == -1 || p[k]==p[j])
        {
            ++j;
            ++k;
            nex[j] = k;
            circle_len = j - nex[j];//nex[j]记录的时上一个字符的最大公共前后缀,而j恰好是正常顺序中的次序
            if(nex[j] > 0 && j % circle_len == 0)//nex[j]>0代表有前后缀,并且前后缀循环
                printf("%d %d\n",j,j/circle_len);

        }
        else
            k = nex[k];
    }
}


int main()
{
    int t,i = 1;
    while(scanf("%d",&t)!=EOF && t)
    {
        scanf("%s",p);
        printf("Test case #%d\n",i++);
        Getnext();
        printf("\n");
    }
    return 0;
}



                                                                   

 The Minimum Length

 There is a string A. The length of A is less than 1,000,000. I rewrite it again and again. Then I got a new string: AAAAAA...... Now I cut it from two different position and get a new string B. Then, give you the string B, can you tell me the length of the shortest possible string A. 
For example, A="abcdefg". I got abcd efgabcdefgabcdefgabcdefg.... Then I cut the red part: efgabcdefgabcde as string B. From B, you should find out the shortest A. 

InputMultiply Test Cases. 
For each line there is a string B which contains only lowercase and uppercase charactors. 
The length of B is no more than 1,000,000. 
OutputFor each line, output an integer, as described above.Sample Input
bcabcab
efgabcdefgabcde
Sample Output
3
7



题目大意:

有一个字符串A,假设A是“abcdefg”,  由A可以重复组成无线长度的AAAAAAA,即“abcdefgabcdefgabcdefg.....”.

从其中截取一段“abcdefgabcdefgabcdefgabcdefg”,取红色部分为截取部分,设它为字符串B。

现在先给出字符串B, 求A最短的长度。


分析与总结:

设字符串C = AAAAAAAA....  由于C是由无数个A组成的,所以里面有无数个循环的A, 那么从C中的任意一个起点开始,也都可以有一个循环,且这个循环长度和原来的A一样。(就像一个圆圈,从任意一点开始走都能走回原点)。

所以,把字符串B就看成是B[0]为起点的一个字符串,原问题可以转换为:求字符串B的最小循环节

根据最小循环节点的求法,很容易就可以求出这题。


代码:

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

char s[1000000+10];
int nex[1000000+10];

void getnext()
{
    int plen = strlen(s);
    int k = -1;
    int j = 0;
    nex[0] = -1;
    while( j < plen)
    {
        if(k == -1 || s[j]==s[k])
        {
            j++;
            k++;
            nex[j] = k;
        }
        else
            k = nex[k];
    }
    printf("%d\n",plen - nex[plen]);


}


int main()
{
    while(scanf("%s",s)!=EOF)
        getnext();
    return 0;
}



                                                                               

 Power Strings

 

Given two strings a and b we define a*b to be their concatenation. For example, if a = "abc" and b = "def" then a*b = "abcdef". If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = "" (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output
For each s you should print the largest n such that s = a^n for some string a.
Sample Input
abcd
aaaa
ababab
.
Sample Output
1
4
3
Hint
This problem has huge input, use scanf instead of cin to avoid time limit exceed.


      题意:这个就比较裸的求循环字串的长度了,直接套就行了,就是注意如果没有的话要输出1

代码:

#include <iostream>
#include <cstdio>
#include <cstring>
#include<algorithm>

using namespace std;

char p[100000000+5];
int nex[100000000+5];

void getnext()
{
    int plen = strlen(p);
    int k = -1 ;
    int j = 0;
    nex[0] = -1;
    while( j < plen)
    {
        if(k == -1 || p[k]== p[j])
        {
            k++;
            j++;
            nex[j] = k;
        }
        else
            k = nex[k];
    }

}


int main()
{
    while(scanf("%s",p)&& p[0] != '.')
    {
        int plen = strlen(p);
        getnext();
        int circle = plen - nex[plen];
        if(circle != plen &&plen%circle==0)
            printf("%d\n",plen/circle);
        else
            printf("1\n");
    }
    return 0;
}


五:对next数组的理解

                                          Seek the Name, Seek the Fame

The little cat is so famous, that many couples tramp over hill and dale to Byteland, and asked the little cat to give names to their newly-born babies. They seek the name, and at the same time seek the fame. In order to escape from such boring job, the innovative little cat works out an easy but fantastic algorithm: 

Step1. Connect the father's name and the mother's name, to a new string S. 
Step2. Find a proper prefix-suffix string of S (which is not only the prefix, but also the suffix of S). 

Example: Father='ala', Mother='la', we have S = 'ala'+'la' = 'alala'. Potential prefix-suffix strings of S are {'a', 'ala', 'alala'}. Given the string S, could you help the little cat to write a program to calculate the length of possible prefix-suffix strings of S? (He might thank you by giving your baby a name:) 
Input
The input contains a number of test cases. Each test case occupies a single line that contains the string S described above. 

Restrictions: Only lowercase letters may appear in the input. 1 <= Length of S <= 400000. 
Output
For each test case, output a single line with integer numbers in increasing order, denoting the possible length of the new baby's name.
Sample Input
ababcababababcabab
aaaaa
Sample Output
2 4 9 18
1 2 3 4 5

题意:求所有匹配字符串前缀后缀的长度

思路:其实next数组就表示的时最长的前缀和后缀匹配,那么只要next数组的值不为零的话,就代表有前后缀匹配,一直递归下去,注意整个字符串也符合条件。所以求出最终的next数组一直递归就可以了。

代码:

#include <iostream>
#include <cstring>
#include <cstdio>

using namespace std;

char p[400000+10];
int nex[400000+10];
int plen ;
void getnext()
{
    plen = strlen(p);
    int k = -1;
    int j = 0;
    nex[0] = -1;
    while( j < plen )
    {
        if( k == -1 || p[j] == p[k])
        {
            j++;
            k++;
            nex[j] = k;
        }
        else
            k = nex[k];
    }

}

int main()
{
    int s[40000] ,i = 0;
    while(scanf("%s",p)!=EOF)
    {
        i = 0;
        getnext();
        int k = plen;//整个字符串的长度
        while(nex[k] != -1)
        {
            s[i++] = k;//递归索引求出所有的前后缀长度
            k = nex[k];
        }
        for(int j = --i; j >= 0; j--)//倒序输出即可
            printf("%d ",s[j]);
        printf("\n");
    }
    return 0;
}





                                                                    Count the string


It is well known that AekdyCoin is good at string problems as well as number theory problems. When given a string s, we can write down all the non-empty prefixes of this string. For example: 
s: "abab" 
The prefixes are: "a", "ab", "aba", "abab" 
For each prefix, we can count the times it matches in s. So we can see that prefix "a" matches twice, "ab" matches twice too, "aba" matches once, and "abab" matches once. Now you are asked to calculate the sum of the match times for all the prefixes. For "abab", it is 2 + 2 + 1 + 1 = 6. 
The answer may be very large, so output the answer mod 10007. 
InputThe first line is a single integer T, indicating the number of test cases. 
For each case, the first line is an integer n (1 <= n <= 200000), which is the length of string s. A line follows giving the string s. The characters in the strings are all lower-case letters. 
OutputFor each case, output only one number: the sum of the match times for all the prefixes of s mod 10007.Sample Input
1
4
abab
Sample Output
6

     

题意:求串的前缀在串中出现的次数 

思路:KMP的next[]数组的应用,处理完next[]数组后,则以第i个字母为结尾的串中出现前缀的个数就是本身加上dp[next[i]]的结果,因为我们知道next[i]数组代表的是 和前缀匹配的长度,所以可以归纳到前缀中


代码:

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

char s[200005];
int nex[200005];
int dp[200005];
void getnext()
{
    int slen = strlen(s);
    int k = -1;
    int j = 0;
    nex[0] = -1;
    while(j < slen)
    {
        if(k == -1 || s[k]==s[j])
            nex[++j] = ++k;
        else
            k=nex[k];
    }
}

int main()
{
    int t,n;
    cin >> t;
    while(t--)
    {
        cin >> n;
        scanf("%s",s);
        getnext();
        int slen = strlen(s);
        memset(dp,0,sizeof(dp));
        int sum = 0 ;
        for(int i = 1 ; i <=slen; i++)
        {
            dp[i] = dp[nex[i]]+1;
            sum =(sum+dp[i])%10007;
        }
        printf("%d\n",sum);
    }
}


六:求多个字符串的最长公共字串


                                                                            Blue Jeans

The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated. 

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers. 

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.
Input
Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:
  • A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
  • m lines each containing a single base sequence consisting of 60 bases.
Output
For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.
Sample Input
3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Sample Output
no significant commonalities
AGATAC
CATCATCAT

 题意:给你几个DNA序列长度为60,以第一个为模板,找到之后的DNA中与模板DNA相同的子序列,且保证子序列最长(长度大于等于3)。

思路:    暴力寻找,枚举相同序列的长度以第一个DNA为模板向其他串中找。其中有个技巧性的地方就是strstr()函数的使用,strstr(a,b)函数为在a中找b,如果可以找到b那么会返回最初始找到b时的位置的地址,若找不到b则返回NULL。


代码:

#include <iostream>
#include <cstring>
#include <string>

using namespace std;

char tem[65];
char str[15][65];
char fin[65];

int m;

int judge()
{
    int i;
    for(i = 1 ;i < m; i++)
    {
        if(strstr(str[i],tem)==0)//在str[i]中是否存在tem,存在就返回所在位置 否则返回null
            return 0;
    }
    return 1;
}


int main()
{
    int n,i,k,j;
    cin >> n;
    while(n--)
    {
        cin >> m;
        for(i = 0; i < m; i++)
            cin >> str[i];
        k = 3;
        int flag = 0;
        while(k <= 60)//暴力搜索 字符串长度遍历
        {
            for(i = 0 ; i <= 60 - k;i++)//枚举字符串长度
            {
                memset(tem,'\0',sizeof(tem));
                for( j = 0 ; j < k ; j++)
                    tem[j] = str[0][i+j];//i记录字符串的起始位置
                tem[k] = '\0';
                if(judge())
                {
                    flag = 1;
                    strcpy(fin,tem);//找到了就记录下来
                }
            }
            k++;
        }
        if(flag==1)
            cout<<fin<<endl;
        else
            cout<<"no significant commonalities"<<endl;

    }
    return 0;
}

七:求a串前缀和b串后缀最长公共字串

                                                Simpsons’ Hidden Talents

Homer: Marge, I just figured out a way to discover some of the talents we weren’t aware we had. 
Marge: Yeah, what is it? 
Homer: Take me for example. I want to find out if I have a talent in politics, OK? 
Marge: OK. 
Homer: So I take some politician’s name, say Clinton, and try to find the length of the longest prefix 
in Clinton’s name that is a suffix in my name. That’s how close I am to being a politician like Clinton 
Marge: Why on earth choose the longest prefix that is a suffix??? 
Homer: Well, our talents are deeply hidden within ourselves, Marge. 
Marge: So how close are you? 
Homer: 0! 
Marge: I’m not surprised. 
Homer: But you know, you must have some real math talent hidden deep in you. 
Marge: How come? 
Homer: Riemann and Marjorie gives 3!!! 
Marge: Who the heck is Riemann? 
Homer: Never mind. 
Write a program that, when given strings s1 and s2, finds the longest prefix of s1 that is a suffix of s2.
InputInput consists of two lines. The first line contains s1 and the second line contains s2. You may assume all letters are in lowercase.OutputOutput consists of a single line that contains the longest string that is a prefix of s1 and a suffix of s2, followed by the length of that prefix. If the longest such string is the empty string, then the output should be 0. 
The lengths of s1 and s2 will be at most 50000.Sample Input
clinton
homer
riemann
marjorie
Sample Output
0
rie 3




          题意给你a,b两个串,求出a串的前缀与b串后缀最长的公共串。
       
        思路:由于是求b的后缀最长是a的前缀。不妨将a,b连接起来,求next数组,这样next[len](len表示a,b连接后的长度)则可以表示匹配到的最大长度。 不过需要注意的是next[len]不能大于a,b的长度。

代码:

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;

char s1[100005];
char s2[50005];
int nex[100005];
int len1,len2,len;

void getnext()
{
    int k = -1;
    int j = 0;
    nex[0] = -1;
    while(j<len)
    {
        if(k == -1 || s1[j]==s1[k])
        {
            j++;
            k++;
            nex[j] = k;
        }
        else
            k = nex[k];
    }
}


int main()
{
    while(~scanf("%s",s1))
    {scanf("%s",s2);
    len1 = strlen(s1);
    len2 = strlen(s2);
    strcat(s1,s2);
    len = len1+len2;
    getnext();
    int i,j = nex[len];
    while(j>len1||j>len2)
        j = nex[j];
    if(j == 0)
        printf("0\n");
    else
    {
        for(i = 0 ; i < j ; i++)
            printf("%c",s1[i]);
        printf(" %d\n",j);
    }
    }
    return 0;
}



猜你喜欢

转载自blog.csdn.net/Xuedan_blog/article/details/80027077