Computing edit distance between two strings is implemented in Python

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/john_bian/article/details/90238355

Given two hypothesis strings str1 and str2, if we want the number becomes str1 str2, minimal insertions, deletions and substitutions needed for this process is referred to as a basic operation of the edit distance between str1 and str2 . If you want to abandon such a string becomes aband on the best solution is to remove these two letters, so the edit distance between them is 2; If you want to abandon the same becomes abanded optimal solution is to replace the last on ED is, two letters need to be replaced, the edit distance between them is 2.

Solving edit distance between two strings can use dynamic programming method. First define a function f(i, j)that represents a string str1 length iof the substring in str2 length to the second jedit distance of the substring. We said here refer substring from the beginning of the length of str1 and str2 iand jsubstring. Dynamic programming can have the following formula:

  1. If i = 0 and j = 0, then f(i, j)= 0
  2. If i = 0 and j> 0, then f(i, j)= j
  3. If i> 0 and j = 0, then f(i, j)= i
  4. If i> 0 and j> 0, thenf (i, j) = min {(f (i-1, j) +1, f (i, j-1) +1, f (i-1, j-1) + cost (i, j)) }

Wherein the fourth case of three formulas min respectively str1 requires deletion, insertion and replacement operation. cost(i, j)It refers to the i-th bit and the j-th bit of str1 comparison, if the same cost(i, j)value is 1, or 0 different.

We are given below Python program implementation of the method.

import numpy as np


def string_distance(str1, str2):
    """
    计算两个字符串之间的编辑距离
    @author: 仰起脸笑的像满月
    @date: 2019/05/15
    :param str1:
    :param str2:
    :return:
    """
    m = str1.__len__()
    n = str2.__len__()
    distance = np.zeros((m+1, n+1))

    for i in range(0, m+1):
        distance[i, 0] = i
    for i in range(0, n+1):
        distance[0, i] = i

    for i in range(1, m+1):
        for j in range(1, n+1):
            if str1[i-1] == str2[j-1]:
                cost = 0
            else:
                cost = 1
            distance[i, j] = min(distance[i-1, j]+1, distance[i, j-1]+1, distance[i-1, j-1]+cost)  # 分别对应删除、插入和替换

    return distance[m, n]


if __name__ == '__main__':
    a = 'abandon'
    b = 'abanded'
    result = string_distance(a, b)
    print(result)

If inappropriate, welcomed the adoption of QQ -depth exchanges, but also welcomed the adoption of a reward micro-channel way for bloggers be supported.

Guess you like

Origin blog.csdn.net/john_bian/article/details/90238355