Dynamic Programming - The Levenshtein Distance Edit Distance

(1) Sample questions

Given two words, word1 and word2, please calculate the minimum number of operations used to convert word1 into word2.
You can perform the following three operations on a word:
• Insert a character
• Delete a character
• Replace a character

【Example】

输入:word1 = "benyam", word2 = "ephrem"
输出:5
(2) Solving steps
1,Confirm status
-  找出最后一步
-  化成子问题
-  画出动态规划表
		-  表的最后一格表示原问题,
		-  表的任意一格表示一个子问题
- 填写动态规划表
        -  理解整个动态规划的过程

[Example]
The last step: find the shortest edit distance from benyam to ephrem
Turn into a sub-problem:
Analysis:
1) The current problem is obtained from the sub-problem after insert, delete, and replace operations.
2) Deleting a character in word1 and inserting a character in word2 are equal. price.
Similarly, deleting a character in word2 and inserting a character in word1 are equivalent;
3) Replacing a character in word1 and replacing a character in word2 are equivalent.
In this way, there are actually only three essentially different operations:
• Insert: Insert a character in word2;
• Delete: insert a character in word1;
• replace: modify a character of word1.
Draw the dynamic programming table:

Insert image description here

①Indicates the original problem: benyam → the shortest edit distance of ephrem
②Indicates the sub-problem: b → the shortest edit distance of ep
Fill in the dynamic programming Table:
Insert image description here
Fill in the initial conditions:
(1) The distances from “”, e, ep, eph, ephr, ephre, ephrem to “” are 0, 1, 2, 3, 4, 5, 6
(2) The distances from "" to "", b, be, ben, beny, benya and benyam are 0, 1, 2, respectively. 3, 4, 5, 6
(3) For ③, the current problem "the shortest edit distance of b → e" can be derived from its previous sub-problem through Insert, Delete or Replace: < /span>

  • When selecting Insert, the sub-problem is b → "", the sub-problem distance is 1, b → e requires a replacement, so the current problem distance is 1+1=2
  • When Delete is selected, the sub-problem is "" → e, the sub-problem distance is 1, b → e requires a replacement, so the current problem distance is 1+1=2
  • When selecting Replace, the sub-problem is "" → "", the sub-problem distance is 0, b → e requires one replacement, so the current problem distance is 0+1=1

Insert image description here

4) For ④, the current problem is "the shortest edit distance of be → e":

  • When selecting Insert, the sub-problem is be → "", the sub-problem distance is 2, be → e requires a delete operation, so the distance is 2+1=3
  • When selecting Delete, the sub-problem is b → e, and the sub-problem distance is 1. be → e requires a delete operation, so the distance is 1+1=2
  • When Replace is selected, the sub-problem is b → "", the sub-problem distance is 1, be → e has no impact on the result when its sub-problem b → "", so the distance is 1+0=1
    Insert image description here

2. Transfer equation
• Create array storage dynamic programming table

// DP 数组
	int n = word1.length();
    int m = word2.length();
    int [][] D = new int[n + 1][m + 1];
  • Array size is often 1 larger than the input
  • Clear the meaning of each element in the array
    e.g. D[2,1] represents the shortest edit distance of be → e
  • Establish transfer equation according to the process of solving DP Table
if (word1[i] == word2[j])
	D[i][j] = D[i - 1][j - 1];
else 
	D[i][j] = min{ D[i-1][j-1], D[i-1][j], D[i][j-1] };

3. Initial conditions and boundary situations

    // 边界状态初始化
    for (int i = 0; i < n + 1; i++) {
      D[i][0] = i;
    }
    for (int j = 0; j < m + 1; j++) {
      D[0][j] = j;
    }

4. Calculation order

    // 计算所有 DP 值
    for (int i = 1; i < n + 1; i++) 
    {
      for (int j = 1; j < m + 1; j++) 
      {
        ....
      }
    }

[Example] Complete code

public static int calStringDistance(String charA, String charB)
{
    char[] A = charA.toCharArray();
    char[] B = charB.toCharArray();
    int n = charA.length();
    int m = charB.length();

    //初始化边界状态
    int[][] DP = new int[n+1][m+1];

    //初始化边界状态
    for (int i = 0; i < A.length+1; i++) DP[i][0] = i;
    for (int j = 0; j < B.length+1; j++) DP[0][j] = j;

    //计算DP Table
    for (int i = 1; i < n + 1; i++)
        for (int j = 1; j < m + 1; j++)
        {
            if(A[i-1]==B[j-1]) DP[i][j] = DP[i-1][j-1];
            else DP[i][j] = Math.min(Math.min(DP[i-1][j],DP[i][j-1]),DP[i-1][j-1])+1;
        }

    return DP[n][m];
}

Reference:
The Levenshtein Distance

Guess you like

Origin blog.csdn.net/flavioy/article/details/107822376