A practice topic
Let A and B be two strings. Use the least operating character string A string is converted into B. Here characters operations include (1) delete a character; (2) inserting a character; (3) a character to another character. A string edit distance B is converted into a string of characters used in the minimum number of operations called string A to B, denoted as d (A, B). For a given string A and string B, calculated edit distance d (A, B).
Input formats:
The first line is the character string A, the second line of the file string B.
Tip: string length does not exceed 2000 characters.
Output formats:
Output edit distance d (A, B)
Sample input:
Here we are given a set of inputs. E.g:
fxpimu
xwrs
Sample output:
Given here corresponding output. E.g:
5
Second, the description of the problem
Two string edit distance seek i.e. seeking to edit a string becomes a minimum number required of another string, that is edited to modify, add or delete operation.
Third, the algorithm description
Ideas: for string A m [A . 1 A 2 ... A m ] (A1 indicates the first character) and the string B n- [B . 1 B 2 ... B n- ], start comparing a character from the last, when a m ≠ B n- time, to find a m-1 and B n- , a m and B n--1 , a m-1 and B n-1- edit distance, three is the minimum plus 1 a m and B n- edit distance. When A m is equal to B n- time, the A m-1 and B n- , A m and B n-1- edit distance and then added to 1 A m-1 And B n-. 1- edit distance compared to the minimum value.
Implementation: with c [i] [j] recorded A I and B J edit distance, the recursive formula is:
(1) When A I ≠ B J when, C [I] [J] = min (C [I] [J - 1] + 1, C [I - a] [J] + 1, C [I - 1 ] [J -. 1] +. 1 );
(2) When the A I = B J when, C [I] [J] = min (C [I] [J -. 1] +. 1, C [I -. 1] [J] +. 1, C [I -. 1 ] [J -. 1] );
c [m] [n] is the A m and B n- edit distance.
As can be seen from the formula, it should be filled with two-dimensional array from left to right, filled from top to bottom, to ensure that an item in the calculation of its sub-problems has been calculated before.
It should initially be initialized to c [0] [i] = i, c [j] [0] = j;
Code:
1 #include <iostream> 2 #include <cstring> 3 using namespace std; 4 #define MAX 2000 5 int c[MAX][MAX]; 6 int min(int a, int b, int c) { 7 return (a < b ? a : b) < c ? (a < b ? a : b) : c; 8 } 9 int length(char *a, char *b, int m, int n) { 10 11 for (int i = 0; i <= m; i++) { 12 c[i][0] = i; 13 } 14 for (int j = 0; j <= n; j++) { 15 c[0][j] = j; 16 } 17 for (int i = 1; i <= m; i++) { 18 for (int j = 1; j <= n; j++) { 19 if (a[i - 1] == b[j - 1]) 20 c[i][j] = min(c[i][j - 1] + 1, c[i - 1][j] + 1, c[i - 1][j - 1]); 21 else 22 c[i][j] = min(c[i][j - 1], c[i - 1][j], c[i - 1][j - 1])+ 1; 23 } 24 } 25 return c[m][n]; 26 } 27 28 int main() { 29 char a[MAX], b[MAX]; 30 cin >> a; 31 cin >> b; 32 int m = strlen(a); 33 int n = strlen(b); 34 cout << length(a, b, m, n); 35 }
Fourth, the analysis algorithm complexity of time and space
The number of sub-problems of m * n, each sub-time algorithm complexity of the problem is constant, so the total time complexity is O (mn);
Since the use of the auxiliary two-dimensional array c [] [], so that the space complexity is O (mn).
Fifth, feelings and experiences
Just beginning to see this question, I felt with the longest common subsequence problem a little contact, think of the longest common subsequence minus both the length of the longer of the two strings strings to give the answer, loud noise like a promising no problem that he listed several sets of data, no problem, just hit the code in this way in class, the result of three there is a wrong answer, baffled. Later, the addition wrote several sets of data, finds that some data can not be considered in accordance with this method, but finally with the idea of dynamic programming, in fact, with the longest common subsequence problem is similar, consistent idea is to have one more child issues to consider . So this experiment is to gain a better understanding of the idea of dynamic programming, there should be more groups with different data to verify their own ideas, do not rush to beat code.