Design and Analysis of Algorithms - edit distance problem

A practice topic

Let A and B be two strings. Use the least operating character string A string is converted into B. Here characters operations include (1) delete a character; (2) inserting a character; (3) a character to another character. A string edit distance B is converted into a string of characters used in the minimum number of operations called string A to B, denoted as d (A, B). For a given string A and string B, calculated edit distance d (A, B).

Input formats:

The first line is the character string A, the second line of the file string B.

Tip: string length does not exceed 2000 characters.

Output formats:

Output edit distance d (A, B)

Sample input:

Here we are given a set of inputs. E.g:

fxpimu
xwrs 

Sample output:

Given here corresponding output. E.g:

5

Second, the description of the problem

Two string edit distance seek i.e. seeking to edit a string becomes a minimum number required of another string, that is edited to modify, add or delete operation.

Third, the algorithm description

Ideas: for string A m [A . 1 A 2 ... A m ] (A1 indicates the first character) and the string B n- [B . 1 B 2 ... B n- ], start comparing a character from the last, when a m  ≠ B n-  time, to find a m-1  and B n- , a m  and B n--1 , a m-1  and B n-1-  edit distance, three is the minimum plus 1 a m  and B n-  edit distance. When A m  is equal to B n-  time, the A m-1  and B n-  , A m  and B n-1-  edit distance and then added to 1 A m-1 And B n-. 1-  edit distance compared to the minimum value.

Implementation: with c [i] [j] recorded A I  and B J   edit distance, the recursive formula is:

(1) When A ≠ B J  when, C [I] [J] = min (C [I] [J - 1] + 1, C [I - a] [J] + 1, C [I - 1 ] [J -. 1] +. 1 );

(2) When the A = B when, C [I] [J] = min (C [I] [J -. 1] +. 1, C [I -. 1] [J] +. 1, C [I -. 1 ] [J -. 1] );

c [m] [n] is the A m  and B n-  edit distance.

As can be seen from the formula, it should be filled with two-dimensional array from left to right, filled from top to bottom, to ensure that an item in the calculation of its sub-problems has been calculated before.

It should initially be initialized to c [0] [i] = i, c [j] [0] = j;

Code:

 1 #include <iostream>
 2 #include <cstring>
 3 using namespace std;
 4 #define MAX 2000
 5 int c[MAX][MAX];
 6 int min(int a, int b, int c) {
 7     return (a < b ? a : b) < c ? (a < b ? a : b) : c;
 8 }
 9 int length(char *a, char *b, int m, int n) {
10 
11     for (int i = 0; i <= m; i++) {
12         c[i][0] = i;
13     }
14     for (int j = 0; j <= n; j++) {
15         c[0][j] = j;
16     }
17     for (int i = 1; i <= m; i++) {
18         for (int j = 1; j <= n; j++) {
19             if (a[i - 1] == b[j - 1]) 
20                 c[i][j] = min(c[i][j - 1] + 1, c[i - 1][j] + 1, c[i - 1][j - 1]);
21             else
22                 c[i][j] = min(c[i][j - 1], c[i - 1][j], c[i - 1][j - 1])+ 1;
23         }
24     }
25     return c[m][n];
26 }
27 
28 int main() {
29     char a[MAX], b[MAX];
30     cin >> a;
31     cin >> b;
32     int m = strlen(a);
33     int n = strlen(b);
34     cout << length(a, b, m, n);
35 }

 

Fourth, the analysis algorithm complexity of time and space

 The number of sub-problems of m * n, each sub-time algorithm complexity of the problem is constant, so the total time complexity is O (mn);

Since the use of the auxiliary two-dimensional array c [] [], so that the space complexity is O (mn).

Fifth, feelings and experiences

Just beginning to see this question, I felt with the longest common subsequence problem a little contact, think of the longest common subsequence minus both the length of the longer of the two strings strings to give the answer, loud noise like a promising no problem that he listed several sets of data, no problem, just hit the code in this way in class, the result of three there is a wrong answer, baffled. Later, the addition wrote several sets of data, finds that some data can not be considered in accordance with this method, but finally with the idea of ​​dynamic programming, in fact, with the longest common subsequence problem is similar, consistent idea is to have one more child issues to consider . So this experiment is to gain a better understanding of the idea of ​​dynamic programming, there should be more groups with different data to verify their own ideas, do not rush to beat code.

Guess you like

Origin www.cnblogs.com/Jettle/p/11708138.html