Day 31: Interview matching algorithm that is easier to ask than KMP!

I have prepared 1,000 e-books and 100 high-definition mind maps in various fields of computer, and I will reply to [Resources] after paying attention to get it! You can even reply [introduction] to join the BAT introduction group!

Day 31: Interview matching algorithm that is easier to ask than KMP!

01. Implement strStr()


The topic of string matching type is a major category of string types.


Title: Implement strStr()


Implement the strStr() function. Given a haystack string and a needle string, find the first position (starting from 0) where the needle string appears in the haystack string. If it does not exist, -1 is returned.


Example 1:

输入: haystack = "hello", needle = "ll"
输出: 2

Example 2:

输入: haystack = "aaaaa", needle = "bba"
输出: -1

Description:

When needle is an empty string, what value should we return? This is a good question in an interview.

For this question, we should return 0 when needle is an empty string. This is consistent with the definition of strstr() in C language and indexOf() in Java.

02, Sunday match


The Sunday algorithm is a string pattern matching proposed by Daniel M.Sunday in 1990. The core idea is: in the matching process, when the pattern string finds a mismatch, the algorithm can skip as many characters as possible for the next step of matching, thereby improving the matching efficiency.

Because this question is the first lecture of the string matching article, we will first popularize a few concepts:

  • String: String is short for string
  • Empty string: A string with a length of zero is called an empty string
  • Main string: The string containing the substring is called the main string accordingly
  • Substring: A subsequence composed of any consecutive characters in a string is called a substring of the string
  • Pattern string: The positioning operation of the substring is also called the pattern matching of the string. It is an operation to find the serial number of the first character of the substring in the main string. The matched main string is called the target string, and the substring is called the pattern string.

Understand these basic concepts and return to this algorithm. The Sunday match does not mean that the person discovered the algorithm on the weekend, but that the person's name is Sunday (maybe parents always work overtime, so the name is given). It sounds terrible, but what does it mean:

If our goal string is: Here is a little Hao

The pattern string is: little

Generally speaking, the first step of the string matching algorithm is to align the target string with the pattern string . Whether it is KMP, BM, SUNDAY, it is the same.

Day 31: Interview matching algorithm that is easier to ask than KMP!
For the SUNDAY algorithm, we start the comparison from the head. Once we find a mismatch, we directly find the first character behind the pattern string in the main string , which is the green "s" below. (Explain here, why is it looking for the first character after the pattern string. After aligning the pattern string with the target string, if there is no match, then the pattern string must be moved. The question is how many steps need to be moved. Each string matches The difference between the algorithms also comes from this place. For KMP, it is to build a partial matching table to calculate. BM is to calculate the movement amount by backward comparison. For SUNDAY, it is the first character after the pattern string is found. Because, regardless of the mode How many steps the string moves, the first character after the pattern string will participate in the next comparison, which is the "s" here)
Day 31: Interview matching algorithm that is easier to ask than KMP!

I found the first character "s" after the pattern string, what should I do next? We need to check whether this element is included in the pattern string. If it is not included, then we can skip a large section and start the comparison from the next character of the character.

Day 31: Interview matching algorithm that is easier to ask than KMP!
Because there is still no match (space and l), we continue to repeat the above process. Find the next element of the pattern string: t

Day 31: Interview matching algorithm that is easier to ask than KMP!
Interesting now, we find that t is included in the pattern string, and t appears in the third from the bottom of the pattern string. So we move the pattern string forward by 3 units:
Day 31: Interview matching algorithm that is easier to ask than KMP!

There is an internal taste, we found that the match was successful, is it amazing? The process of proof will not be discussed today (I will post an algorithm proof article later to prove some of the algorithms mentioned before. What I need you to do is to master the above! )

To catch dry goods, what have we done in this process:

  • Align the target string and the pattern string, matching from front to back
  • Focus on the first element (core) behind the pattern string in the main string
  • If the character of interest does not appear in the substring, skip directly
  • Otherwise, start to move the pattern string, moving digits = substring length-the rightmost position of the character (starting with 0)

    03, algorithm application


Naturally, we apply this algorithm to our problem...

According to the analysis, the code is obtained: (for a JAVA version that you can understand)

//JAVA 
class Solution {
    public int strStr(String origin, String aim) {
        if (origin == null || aim == null) {
            return 0; 
        } 
        if (origin.length() < aim.length()) {
            return -1; 
        }
        //目标串匹配索
        int originIndex = 0;
        //模式串匹配索引
        int aimIndex = 0;
        // 成功匹配完终止条件:所有aim均成功匹配
        while (aimIndex < aim.length()) {
            // 针对origin匹配完,但aim未匹配完情况处理 如 mississippi sippia
            if (originIndex > origin.length() - 1) {
                return -1;
            }
            if (origin.charAt(originIndex) == aim.charAt(aimIndex)) {
                // 匹配则index均加1
                originIndex++;
                aimIndex++;
            } else {
                //在我们上面的样例中,第一次计算值为6,第二次值为13
                int nextCharIndex = originIndex - aimIndex + aim.length();
                //判断下一个目标字符(上面图里的那个绿框框)是否存在。
                if (nextCharIndex < origin.length()) {
                    // 判断目标字符在模式串中匹配到,返回最后一个匹配的index
                    int step = aim.lastIndexOf(origin.charAt(nextCharIndex));
                    if (step == -1) {
                        // 不存在的话,设置到目标字符的下一个元素
                        originIndex = nextCharIndex + 1;
                    } else {
                        // 存在的话,移动对应的数字(参考上文中的存在公式)
                        originIndex = nextCharIndex - step;
                    }
                    //模式串总是从第一个开始匹配
                    aimIndex = 0;
                } else {
                    return -1;
                }
            }
        }
        return originIndex - aimIndex;
    }
}

Guess you like

Origin blog.51cto.com/15076236/2608556