Dynamic programming edit distance problem (Edit Distance Problem)

Website link: https://algorithms.tutorialhorizon.com/dynamic-programming-edit-distance-problem/

1. Objectives : Given two strings s1, s2, write a conversion algorithm to find the minimum number of operation steps s1 (edit distance) s2 required.

The allowed operations:

  1. Insert (insert) - insert a new character

  2. delete (delete) - Delete a character

  3. replace (replace) - is replaced with another character

example:

s1 = "horizon"
s2 = "horzon"
Output: 1  {remove 'i' from string s1}

s1 = "horizon"
s2 = "horizontal"
Output: 3  {insert 't', 'a', 'l' characters in string s1}

2. Method:

At the same time a comparison of two character strings. Here we right to left (from back to front backwards) to compare strings.

Now for each string, we have two choices:

  1. If the two strings of the last character is the same, then ignore this last character, then simultaneously compare the penultimate character, has been relatively down, knowing that the same two characters, to enter the second case (similar to recursion solution remaining characters)

  2. If the last two character strings are not identical, then try the three operations (insert, replace, delete) so that the last character of the same. And acquires the remaining solution string recursively for each possibility and select a minimum value.

    Suppose the given string are s1 and s2, respectively, of length m and n:

case1: the last character of the same, to solve the remaining m-1 recursive, n-1 characters.

case2: The last character is not the same, then recursively try all possible actions.

  1. Inserting a character in s1 (inserted character string to the last character s2 same, so that the same two last character in the string): the length s1 now will be m + 1, n is the length s2 , and ignore the last character recursive solution remaining m, n -1 characters.

  2. Delete the last character from the string s1. Now s1 length will be m-1, s2 of length n, recursive solution m-1, n.

  3. The last character is replaced s1 (s2 string and the last character in the same, so that the same two last character in the string): s1 length is m, the length s2 is n, the last character is ignored, then recursive solution m-1, n- 1.

Selecting a minimum value (a, b, c) of. First, we will see the recursive solution, then the solution will be improved to reduce complexity by using dynamic programming.

# Recursive solution based on
def edit_dist(str1, str2, m, n):
    if m == 0:
        return n 
    if n == 0:
        return m 
    
    if str1[m-1] == str2[n-1]:
        return edit_dist(str1, str2, m-1, n-1)
    
    return 1 + min(edit_dist(str1, str2, m, n-1),
                   edit_dist(str1, str2, m-1, n), 
                   edit_dist(str1, str2, m-1, n-1))
s1 = "horizon"
s2 = "horizontal"
print (edit_dist (s1, s2, only the (S1), as (s2)))
Output:
3

 

Let us analyze these solutions. In the worst case, we need to perform operations on each character string, because we have three operations, so the time complexity will be O (3 ^ n).

Let us look at the question of whether there is an overlap of the sub. Example: String s1: "CAT", the string s2: "DOG"

As we have seen, there are many sub-problems can be solved repeatedly, so there are a lot of overlapping sub-problems. We can bottom-up approach to solve it using dynamic programming. We will solve the problem and stores them into the array (an array of maintenance), and use the solution as needed, to ensure that each sub-problem is solved only once.

# Based on Dynamic Programming Solutions
def dp_edit_dist(str1, str2):
    
    # M, n each string length of str1 and str2
    m n = len (str1), only the (str2)
    
    # Building an Answer storage sub-problems (sub-problem) is set to double-digit 
    dp = [[0 for x in range(n+1)] for x in range(m+1)] 
      
    # Dynamic programming algorithm, fills an array
    for i in range(m+1): 
        for j in range(n+1): 
  
            Suppose first string # expense empty, the conversion of j (j insertion times)
            if i == 0: 
                dp[i][j] = j    
              
            # Likewise, if the second string is empty, the cost of the conversion to i (i insertion times)
            elif j == 0:
                dp[i][j] = i
            
            # If the last character are equal, it will not produce the cost
            elif str1[i-1] == str2[j-1]: 
                dp[i][j] = dp[i-1][j-1] 
  
            # If the last character is not the same, then consider a variety of possibilities and choose among the smallest value
            else: 
                dp[i][j] = 1 + min(dp[i][j-1],        # Insert 
                                   dp[i-1][j],        # Remove 
                                   dp[i-1][j-1])      # Replace 
  
    return dp[m][n] 

  

Comment:

In computer science, by calculating the edit distance is a character string into a minimum number of operations required to quantify the two strings another string (e.g. word) from each other a number of different ways. Editing can be found at a distance of applications in natural language processing, such as automatic spelling correction which can be used to determine the correct misspelled word by selecting the smaller of the word in question from the word from the dictionary. In bioinformatics, which can be used to quantify the similarity of the DNA sequence, which can be regarded as the letters A, C G and T string.

 

Guess you like

Origin www.cnblogs.com/carlber/p/12142283.html