Suggesting a matching string using Levenstein distance algorithm is too slow

Cortex :

I have a that function returns an sf::EventType based on a string provided by a user. If there is no match, the functions returns sf::nullopt. But I would like to print a suggested, valid, sf::EventType, that is the closest to what the user provided, to help with misspellings etc.

There are 'only' 13 valid sf::EventType's that have to be checked for the closest match, and i'm assuming that a user wont enter some ridiculusly long string.

On my laptops m3-7Y30 intel processor I have tested the functions speed on both debug and release mode:

~45 seconds on debug ~3 seconds on release

Huge difference, but still I feel like 3 seconds is a bit much given that the user might provide anywhere from 5 to 100 event types.

Given these results I doubt that this approach to suggest a valid sf::EventType could be optimized enough to make it viable, but if it can, I would like to know how. If not, I would like a suggestion for an alternative, that would still print a suggestion, no matter how far off the provided string is.

The relevant code looks like this:

convertToSfEvent

std::optional<sf::Event::EventType> EventFileReader::convertToSfEvent(std::string_view event)
    {
        if      (event == "Closed")              return sf::Event::EventType::Closed;
        else if (event == "Resized")             return sf::Event::EventType::Resized;
        else if (event == "LostFocus")           return sf::Event::EventType::LostFocus;
        else if (event == "GainedFocus")         return sf::Event::EventType::GainedFocus;
        else if (event == "TextEntered")         return sf::Event::EventType::TextEntered;
        else if (event == "KeyPressed")          return sf::Event::EventType::KeyPressed;
        else if (event == "KeyReleased")         return sf::Event::EventType::KeyReleased;
        else if (event == "MouseWheelScrolled")  return sf::Event::EventType::MouseWheelScrolled;
        else if (event == "MouseButtonPressed")  return sf::Event::EventType::MouseButtonPressed;
        else if (event == "MouseButtonReleased") return sf::Event::EventType::MouseButtonReleased;
        else if (event == "MouseMoved")          return sf::Event::EventType::MouseMoved;
        else if (event == "MouseEntered")        return sf::Event::EventType::MouseEntered;
        else if (event == "MouseLeft")           return sf::Event::EventType::MouseLeft;
        else
        {
            // Heres is where I search for a match, and the recursion madness starts
            auto smallest_required_change{ INT_MAX };
            auto closest_string{ std::string() };
            for (auto event_type : this->event_types)
            {
                auto result{ levensteinDistance(event, event_type, event.length(), event_type.length()) };

                if (result < smallest_required_change)
                {
                    smallest_required_change = result;
                    closest_string = event_type;
                }
            }

            std::cerr << "Could not recognize event_type token: '" << event << "' did you mean: '" << closest_string << "'?" << "\n";

            return std::nullopt;
        }
    }

levensteinDistance

std::size_t EventFileReader::levensteinDistance(std::string_view first, std::string_view second, std::size_t first_pos, std::size_t second_pos)
    {
        static auto one{ std::size_t(1) };

        if (!first_pos)
            return first_pos;

        if (!second_pos)
            return second_pos;

        if (first[first_pos - one] == second[second_pos - one])
            return levensteinDistance(first, second, first_pos - one, second_pos - one);

        return 1 + std::min({ levensteinDistance(first, second, first_pos,       second_pos - one),
                              levensteinDistance(first, second, first_pos - one, second_pos),
                              levensteinDistance(first, second, first_pos - one, second_pos - one)
                           });
    }
lenik :

Your levenshtein implementation is recursive and slow, you may want to change that to a faster one, for example (source: https://rosettacode.org/wiki/Levenshtein_distance):

// Compute Levenshtein Distance
// Martin Ettl, 2012-10-05

size_t uiLevenshteinDistance(const std::string &s1, const std::string &s2)
{
  const size_t m(s1.size());
  const size_t n(s2.size());

  if( m==0 ) return n;
  if( n==0 ) return m;

  size_t *costs = new size_t[n + 1];

  for( size_t k=0; k<=n; k++ ) costs[k] = k;

  size_t i = 0;
  for ( std::string::const_iterator it1 = s1.begin(); it1 != s1.end(); ++it1, ++i )
  {
    costs[0] = i+1;
    size_t corner = i;

    size_t j = 0;
    for ( std::string::const_iterator it2 = s2.begin(); it2 != s2.end(); ++it2, ++j )
    {
      size_t upper = costs[j+1];
      if( *it1 == *it2 )
      {
          costs[j+1] = corner;
      }
      else
      {
        size_t t(upper<corner?upper:corner);
        costs[j+1] = (costs[j]<t?costs[j]:t)+1;
      }

      corner = upper;
    }
  }

  size_t result = costs[n];
  delete [] costs;

  return result;
}

Or you may check this page for the inspiration: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=33168&siteId=1