Sword refers to offer 19. Regular expression matching

1. Description of the topic

Please implement a function to match regular expressions containing '. ' and "*". The character '.' in the mode means any character, and "*" means that the character in front of it can appear any number of times (including 0 times). In this question, matching means that all characters of the string match the entire pattern. For example, the string "aaa" matches the patterns "aa" and "ab*ac*a", but neither "aa.a" nor "ab*a".

Example 1

输入:
s = "aa"
p = "a"
输出: false
解释: "a" 无法匹配 "aa" 整个字符串。

Example 2

输入:
s = "aa"
p = "a*"
输出: true
解释: 因为 '*' 代表可以匹配零个或多个前面的那一个元素, 在这里前面的元素就是 'a'。因此,字符串 "aa" 可被视为 'a' 重复了一次。

Example 3

输入:
s = "ab"
p = ".*"
输出: true
解释: ".*" 表示可匹配零个或多个('*')任意字符('.')。

Example 4

输入:
s = "aab"
p = "c*a*b"
输出: true
解释: 因为 '*' 表示零个或多个,这里 'c' 为 0 个, 'a' 被重复一次。因此可以匹配字符串 "aab"。

Example 5

输入:
s = "mississippi"
p = "mis*is*p*."
输出: false

2. Solution: dynamic programming

Ideas and Algorithms

The matching in the title is a "step-by-step matching" process: each time we start from the string ppTake out a character or a combination of "character + asterisk" from p , and put it in sss to match. For a character in , it can only be inssMatch a character in s , the matching method is unique; and for
ppAs far as the combination of character + asterisk in p is concerned, it can be found in sss matches any natural number of characters and is not unique. Therefore, we can consider using dynamic programming to enumerate the matching schemes.

We use f [ i ] [ j ] f[i][j]f [ i ] [ j ] meansssex iiof si characters andppex jjin pWhether j characters can be matched. When doing state transitions, we considerppp 'sjjthMatching of j characters:

  • if ppp 'sjjthj characters is a lowercase letter, then we must insss matches an identical lowercase letter, ie

f [ i ] [ j ] = { f [ i − 1 ] [ j − 1 ] , s[i]=p[j] f a l s e , s[i] ≠ p[j] f[i][j]= \begin {cases} f[i-1][j-1],&\text {s[i]=p[j]}\\ false,&\text {s[i]$\neq$p[j]} \end {cases} f[i][j]={ f[i1][j1],false,s[i]=p[j]s[i]=p[j]
That is, if sss secondiii characters andppp 'sjjthj characters are not the same, then the match cannot be performed; otherwise we can match the last character of the two strings, and the complete matching result depends on the previous part of the two strings.

  • if ppp 'sjjthj characters are * then it means we canppp 'sjth − 1 j-1j1 character matches any natural number of times. In the case of 0 matches, we have f [ i ] [ j ] = f [ i ] [ j − 2 ] f[i][j]=f[i][j-2]f[i][j]=f[i][j2 ]
    That is, we "wasted" a combination of characters + asterisks, which did not match anysscharacters in s .
    In the case of matching 1,2,3,⋯ times, similarly we have
    insert image description here
    If we transfer through this method, then we need to enumerate this combination to matchsss , a few characters will increase the time complexity, and the code is very cumbersome to write. We might as well consider this problem from another angle: in the process of matching letters + asterisks, there are essentially only two situations:
    • match ssA character at the end of s , the character is thrown away, and the combination can continue to match;
    • No character is matched, the combination is discarded and no further matching is performed.
      If we think from this perspective, we can write a very delicate state transition equation:
      insert image description here
    • In any case, as long as p [ j ] p[j]p [ j ] is. , thenp[j] p[j]p [ j ] must successfully matchssAny lowercase letter in s .

The final state transition equation is as follows:
insert image description here
where matches ( x , y ) matches(x,y)matches(x,y ) Auxiliary function to judge whether two characters match, only whenyyy is. orxxxyyThese two characters will only match if y itself is the same.

detail

The boundary condition of dynamic programming is f [ 0 ] [ 0 ] = truef[0][0]=truef[0][0]=t r u e , that is, two empty strings can be matched. The final answer isf [ m ] [ n ] f[m][n]f [ m ] [ n ] wheremmm andnnn is the string ssrespectivelys andppthe length of p . . Because in most languages, the character subscript of a string starts from 00, so when implementing the above state transition equation, it is necessary to pay attention to the corresponding relationship between each dimension subscript in the state and the actual character subscript.
In the above state transition equation, if the string p contains a combination of "character + asterisk" (such as a*), then a will be matched first when the state transition is performed (when p [ j ] p[ j]p [ j ] is a), then match a* as a whole (whenp [ j ] p[j]p [ j ] is *). However, in the topic description, we must regard a* as a whole, so matching a does not meet the requirements of the topic.

3. Code snippet

class Solution {
    
    
public:
   bool isMatch(string s, string p) {
    
    
       int m = s.size();
       int n = p.size();

       auto matches = [&](int i, int j) {
    
    
           if (i == 0) {
    
    
               return false;
           }
           if (p[j - 1] == '.') {
    
    
               return true;
           }
           return s[i - 1] == p[j - 1];
       };

       vector<vector<int>> f(m + 1, vector<int>(n + 1));
       f[0][0] = true;
       for (int i = 0; i <= m; ++i) {
    
    
           for (int j = 1; j <= n; ++j) {
    
    
               if (p[j - 1] == '*') {
    
    
                   f[i][j] |= f[i][j - 2];
                   if (matches(i, j - 1)) {
    
    
                       f[i][j] |= f[i - 1][j];
                   }
               }
               else {
    
    
                   if (matches(i, j)) {
    
    
                       f[i][j] |= f[i - 1][j - 1];
                   }
               }
           }
       }
       return f[m][n];
   }
};

4. Complexity analysis

  • Time complexity: O ( mn ) O(mn)O ( m n ) where m and n are the lengths of strings s and p respectively. We need to calculate all the states, and the time complexity of each state transition isO ( 1 ) O(1)O(1)
  • Space complexity: O ( mn ) O(mn)O ( m n ) , which is the space used to store all states.
    Reference link: https://leetcode.cn/problems/regular-expression-matching/solution/zheng-ze-biao-da-shi-pi-pei-by-leetcode-solution/

Guess you like

Origin blog.csdn.net/qq_43679351/article/details/124927301