"Edit Distance" of the Dynamic Programming Series

72. Edit distance

Given two words word1 and word2 , please calculate the minimum number of operations used to convert word1 to word2 .

You can perform the following three operations on a word:

  1. Insert a character
  2. Delete a character
  3. Replace a character

Example 1:

输入:word1 = "horse", word2 = "ros"
输出:3
解释:
horse -> rorse (将 'h' 替换为 'r')
rorse -> rose (删除 'r')
rose -> ros (删除 'e')

Example 2:

输入:word1 = "intention", word2 = "execution"
输出:5
解释:
intention -> inention (删除 't')
inention -> enention (将 'i' 替换为 'e')
enention -> exention (将 'n' 替换为 'x')
exention -> exection (将 'n' 替换为 'c')
exection -> execution (插入 'u')

Edit distance algorithm is widely used data scientists, is used 机器翻译and 语音识别the basic algorithm evaluation criteria.

The most intuitive method is to violently check all possible editing methods and choose the shortest one. All possible editing methods are exponential, but we don’t need to do so many calculations because we only need to find the sequence with the shortest distance instead of all possible sequences.

We can perform three operations on any word:

Insert a character;

Delete a character;

Replace one character.

The question is given two words, set as A and B, so that we can have six operation methods.

But we can find that if we have word A and word B:

  • Deleting a character from word A is equivalent to inserting a character from word B. For example, when word A is doge and word B is dog, we can either delete the last character e of word A to get the same dog, or add a character e to the end of word B to get the same doge;

  • Similarly, deleting a character from word B is equivalent to inserting a character into word A;

  • Replacing a character for word A is equivalent to replacing a character for word B. For example, when word A is bat and word B is cat, we modify the first letter b -> c of word A, and modify the first letter c -> b of word B are equivalent.

In this way, we can keep one string unchanged and just manipulate another string. For example, keep B and only operate A:

  • 插入One character in word A

  • 删除 A character in A

  • 修改 A character in A

As shown in the figure below, here is one of the editing methods with the shortest editing distance:

The shortest edit distance in the figure above is 5.

It can be found that there are not only three operations, but there is actually a fourth operation, which is to do nothing (skip). For example, when $s1[i] == s2[j] $, execute i − − i--i Iej − − j--j

After understanding the above process, the following will directly use the idea of ​​dynamic programming to solve the problem and define the meaning of the dynamic programming array:

dp[i][j]Representation s1[0...i]and s2[0...j]the minimum edit distance.

base caseIs dp[..][0]and dp[0][..], when one of the strings represent 0, depending on the minimum edit distance is not sure that the length of the string 0, such as A 0, B is not 0, then the shortest edit distance have two options:

  • Or delete all the characters in B to 0
  • Either add characters to A one by one to make it the same as B

When $s1[i] != s2[j] $, there are three options:

  • 插入One character in word A

  • 删除 A character in A

  • 修改 A character in A

It’s just that which one of the three to choose depends on which one dp[i][j]is chosen, so choose the smallest of the three

The Java implementation is as follows :

class Solution {
    
    
    public int minDistance(String word1, String word2) {
    
    
        int m = word1.length();
        int n = word2.length();
        int[][] dp = new int[m+1][n+1];

        // base case
        for(int i = 0; i <= m; i++) dp[i][0] = i;
        for(int i = 0; i <= n; i++) dp[0][i] = i;
        
        for(int i = 1; i <= m; i++){
    
    
            for(int j = 1; j <= n; j++){
    
    
                if(word1.charAt(i-1) == word2.charAt(j-1)){
    
    
                    dp[i][j] = dp[i-1][j-1];
                }else{
    
    
                    dp[i][j] = min(
                        dp[i][j-1] + 1,
                        dp[i-1][j] + 1,
                        dp[i-1][j-1] + 1
                    );
                }
            }
        }
        return dp[m][n];
    }

    public static int min(int a, int b, int c){
    
    
        return Math.min(a, Math.min(b, c));
    }
}

Time complexity: O(n^2)

Space complexity: O(n^2)

Guess you like

Origin blog.csdn.net/weixin_44471490/article/details/109267942