题目描述

【leetcode】28. 实现 strStr()(Implement strStr())
实现 strStr() 函数。

给定一个 haystack 字符串和一个 needle 字符串，在 haystack 字符串中找出 needle 字符串出现的第一个位置 (从0开始)。如果不存在，则返回 -1。
在这里插入图片描述

第一次解答

思路：
感觉与最长公共前缀类似
感觉与剔除有序数组重复元素类似
采用双指针遍历haystack
慢指针指向haysetack与needle第一个元素相同的位置
接着快指针依次从慢指针开始，依次判断haysetack后续字符是否与needle一一相等
指针可以用索引表示

注意：
haystack.size() == 0
needle.size() == 0
needle.size() > haystack.size()
haystack.size() == 1

test case: 两两一组
“mississippi”
“issip”
“mississippi”
“issipi”
“hello”
“ll”
“aaaaa”
“bba”


class Solution {
public:
    int strStr(string haystack, string needle) {
        if(needle.size() == 0 )
            return 0;
        if(needle.size() > haystack.size() || haystack.size() == 0)
            return -1;

        int index_slow = 0;
        for (int index_fast = 0; index_fast < haystack.size(); )
        {
            if(haystack[index_fast] != needle[index_fast-index_slow])
            {
                ++index_slow;
                index_fast = index_slow;
                // 如果index_slow大于该值，haystack就没有足够长度匹配needle了
                if(index_slow > haystack.size()-needle.size())
                    return -1;
            }
            else
            {
                // 如果needle已经遍历完，元素仍然是重复的，则成功找到
                if(index_fast-index_slow+1 >= needle.size())
                    return index_slow;
                ++index_fast;
            }
        }
        
        // 按理说不会运行到这一步,如果运行到，也是没找到
        return -1;

    }
};

结果：
在这里插入图片描述

第二次解答

第一次解答感觉脑子抽了，为了只用一个for loop，代码绕来绕去，很不直观，还是改成双重for循环吧。
思路：
双重for循环
第一重循环，找到haystack中与needle[0]相同的元素下标i
第二重循环，在i的基础上往后遍历，若所有元素与needle一一相等则成功找到

注意：
haystack.size() == 0
needle.size() == 0
needle.size() > haystack.size()
haystack.size() == 1
i>haystack.size()-needle.size()后，haystack剩下的元素已不够needle长

class Solution {
public:
    int strStr(string haystack, string needle) {
        if(needle.size() == 0 )
            return 0;
        if(needle.size() > haystack.size() || haystack.size() == 0)
            return -1;

        int index_needle = 0;
        for(int i=0; i<=haystack.size()-needle.size(); ++i)
        {
            if(haystack[i] != needle[0])
                continue;
            int j=1;
            for(; j<needle.size(); ++j)
            {
                if(haystack[i+j] != needle[j])
                {
                    break;
                }
            }
            // j==needle.size()，那么匹配成功
            if(j >= needle.size())
                return i;
        }
        
        return -1;

    }
};

结果：
在这里插入图片描述

第三次解答

前面两个是暴力法，从haystack 字符串中不断截取与needle字符串长度相等的字串进行对比，若不相等，则在haystack 字符串的下一格位置中截取新的字符串，与needle字符串进行比较。这里的问题是：当不匹配时，我从haystack 字符串的下一格位置截取，是否合适，能不能在下2格，甚至下3格位置截取字符串？我们之前截取字符串匹配错误后，能不能提供一些信息，让我知道下一次应该步进多少格，来截取新的字符串比较？
答案是可以的，方法有很多，我首先试了BM(Boyer-Moore)算法。该算法由Boyer 和 Moore发明，算法过程具体看图灵图书系列的《算法（第四版）》，讲得挺好。
BM算法论文：BOYER R.S., MOORE J.S., 1977, A fast string searching algorithm. Communications of the ACM. 20:762-772.

该算法首先需要构建一个跳跃表，假设haystack 字符串长度为N，needle字符串长度为M，最坏情况下该算法操作次数~=NM，最好情况下（不匹配的字符基本不在needle中出现）操作次数为N/M。
这里有一个问题：为什么BM算法需要从右向左匹配，而不是从左向右匹配？并且每次匹配都匹配匹配串中最右边那个字符？
答：这是两个问题。每次匹配都匹配匹配串中最右边那个字符？因为匹配失败后，下一次可能能匹配成功的一定是从最右边那个字符对齐的子串。为什么从右向左匹配？因为采用匹配失败后从最右边对齐的方式，当从左向右匹配时，很容易让字符串回退，而从右向左则不会有这个问题。

这里给出实现：

#define STR_NUM 127
class Solution {
public:
    int strStr(string haystack, string needle) {
        if(needle.size() == 0 )
            return 0;
        if(needle.size() > haystack.size() || haystack.size() == 0)
            return -1;

        int N = haystack.size();//字符串长度
        int M = needle.size();// 匹配串长度

        // 生成字符串跳跃表，ascii码共127个
        char beyond_table[STR_NUM];
        // memset(beyond_table, -1, STR_NUM);
        for(char str=0; str<STR_NUM; ++str){
            beyond_table[str] = -1;
        }
        for(char i=0; i<needle.size(); ++i){
            beyond_table[needle[i]] = i;
        }

        int step = 0;
        for(int i=0; i<=N-M; i+=step)
        {
            step = 0;
            for(int j=M-1; j>=0; --j){
                if(haystack[i+j] != needle[j]){
                    step = j - beyond_table[haystack[i+j]];
                    break;
                }
            }
            // 如果beyond_table不起作用，则前进1单位
            if(step < 0){
                step = 1;
            }
            // 如果找到字串
            if(0 == step) return i;
            
        }
        
        return -1;

    }
};

结果：
在这里插入图片描述

第四次解答

Sunday 算法是上面BM算法的改进。
论文：D.M. Sunday: A Very Fast Substring Search Algorithm. Communications of the ACM, 33, 8, 132-142 (1990)。
sunday算法可以看这篇博客

第四次解答基于BM算法稍作修改，实现了sunday算法。

#define STR_NUM 127
class Solution {
public:
    int strStr(string haystack, string needle) {
        if(needle.size() == 0 )
            return 0;
        if(needle.size() > haystack.size() || haystack.size() == 0)
            return -1;

        int N = haystack.size();//字符串长度
        int M = needle.size();// 匹配串长度
        // 生成字符串跳跃表，ascii码共127个
        char beyond_table[STR_NUM];
        // memset(beyond_table, -1, STR_NUM);
        for(char str=0; str<STR_NUM; ++str){
            beyond_table[str] = -1;
        }
        for(char i=0; i<needle.size(); ++i){
            beyond_table[needle[i]] = i;
        }

        

        int step = 0;
        for(int i=0; i<=N-M; i+=step)
        {
            step = 0;
            
            for(int j=0; j<M; ++j){
                if(haystack[i+j] != needle[j]){
                    if(i+M < N)
                        step = M - beyond_table[haystack[i+M]];
                    else
                        return -1;
                    break;
                }
            }
            
            // 如果找到子串
            if(0 == step) return i;
            
        }
        
        return -1;

    }
};

结果：
在这里插入图片描述

TODO:

这里还有个KMP算法，用这个也试一下
https://blog.csdn.net/starstar1992/article/details/54913261

【leetcode】28. 实现 strStr()(Implement strStr())