Edit Distance (ED)

The Edit Distance (ED) problem, also known as the Levenshtein distance, measures the minimum number of operations required to transform one string into another. These operations typically include insertion, deletion, or substitution of a single character.

From CS341

  1. Types of Mispelling:
    • Character typed wrong: Requires a replacement operation.
    • Character missed: Requires an addition operation.
    • Extra character typed: Requires a deletion operation.
  2. Alignment:
    • The first step is to align the two strings. This involves inserting gaps to match the lengths of the strings.
  3. Calculating Edit Distance:
    • Three cases may occur:
      1. Aligning the last character of with the last character of .
        • If , no additional cost is incurred.
        • If , an additional cost of is incurred.
        • Recursively calculate the edit distance between and .
      2. is not aligned with .
        • Incur an additional cost of (deletion or addition).
        • Recursively calculate the edit distance between and .
      3. is not aligned with at the end.
        • Incur an additional cost of (deletion or addition).
        • Recursively calculate the edit distance between and .
  4. Dynamic Programming:
    • Define an array of size to store the edit distances between substrings of and .
    • Use dynamic programming to fill in the array based on the recursive relations described above.
    • Base cases: The edit distance between an empty string and a non-empty string is the length of the non-empty string, and vice versa.
    • The solution lies in the bottom-right cell of the array .

The recurrence relation provided defines the edit distance between two strings and in terms of smaller subproblems. Here’s a breakdown of the recurrence:

  • Let be the edit distance between the prefixes and .
  • Base cases:
    • : The edit distance between an empty string and is because it requires adding characters to transform the empty string into .
    • : The edit distance between and an empty string is because it requires deleting characters from to make it empty.
  • Recurrence relation:
    • is the minimum of three values:
      1. if : This corresponds to a substitution operation where is replaced by .
      2. : This corresponds to a deletion operation where is deleted, and we find the edit distance between and .
      3. : This corresponds to an insertion operation where is added to , and we find the edit distance between and .
  • The algorithm computes all values using two nested loops, iterating over all possible prefixes of the strings and , resulting in a runtime of , where and are the lengths of the strings and respectively.