Selected Algorithm Questions (4) - String Comparison

Author: Zhai Tianbao Steven
Copyright statement: The copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source

Title description:

It’s been a long time since I’ve done a topic. Recently, I came across a topic while using Douyin. At first glance, it’s not difficult, but if I’m unfamiliar with it, I wrote it for fun.

Comparing two strings, such as abcdefg and 25abdfxx, returns: position 0 is over: 25; position 2 is missing: c; position 4 is missing: e; position 6 is wrong, should be g.

Problem-solving ideas:

I saw that many people said that the double pointer method and dynamic programming can be solved. I didn't think about it so much for the time being, so I used a simple double loop + sliding window.

  1. Take idx1 and idx2 as the starting point of the sliding window. Based on string 1, traverse string 2 to find the same character.
  2. When the two characters are consistent, judge the size of j and idx2. If j is large and i is consistent with idx1, it means that there is an extra character in front of the same character in string 2. After that, add 1 to idx1, and then make idx2 equal to j+1, so that the sliding window of the two strings can start from a position after the same character.
  3. If i is greater than idx1, and j is consistent with idx2, similarly, it means that a character is missing before the same character in string 1. After that, add 1 to idx2, and idx1 is equal to i+1.
  4. If i is greater than idx1 and j is also greater than idx2, it means that the two strings have a different character before the same character, that is, a matching error.
  5. If none of the above three conditions are met, it means that consecutive identical strings have been encountered. Add 1 to both idx1 and idx2, just skip it.
  6. If i does not find a matching object at the end, all characters from idx1 to the end of string 1 will fail to match.

Test code:

#include <iostream>
#include <string>
using namespace std;

void compareStrings(std::string str1, std::string str2) {
	int len1 = str1.length();
	int len2 = str2.length();
	
	int idx1 = 0;
	int idx2 = 0;
	for (int i = 0; i < len1; ++i) {
		for (int j = idx2; j < len2; ++j) {
			if (str1[i] == str2[j]) {
				if (j > idx2 && i == idx1) {
					cout << "位置" << i << "多出:" << str2.substr(idx2, j - idx2) << endl;
					idx1++;
					idx2 = j + 1;
				}
				else if (i > idx1 && j == idx2) {
					cout << "位置" << idx1 << "缺少:" << str1.substr(idx1, i - idx1) << endl;
					idx1 = i + 1;
					idx2++;
				}
				else if (i > idx1 && j > idx2) {
					cout << "位置" << idx1 << "错误,应为:" << str1.substr(idx1, i - idx1) << endl;
					idx1 = i + 1;
					idx2= j + 1;
				}
				else{
					idx1++;
					idx2++;
				}
				break;
			}			
		}
		if (i == (len1 - 1) && (idx1 < len1)) {
			cout << "位置" << idx1 << "错误,应为:" << str1.substr(idx1, i - idx1 + 1) << endl;
		}
	}
}

int main() {
	std::string str1 = "abcdefg";
	std::string str2 = "25abdfxx";
	cout << "字符串1:" << str1 << endl;
	cout << "字符串2:" << str2 << endl;
	compareStrings(str1, str2);
	return 0;
}

Test Results:

When the string is as required by the title, the result is as follows, which can be found to be satisfactory.

Add a few characters and try again, it is still feasible.

If the content in the middle is messed up a bit, there is no problem for now.

A few more repeated characters, according to my logic, the output is correct, but I don't know if the title means this.

       If the code needs to be improved or has any bugs, please leave a comment and leave a message, and I will correct it in time to avoid misleading others~

       If the article helps you, you can give me a like to let me know, I will be very happy ~ Come on!


       A friend asked a question, so I improved the algorithm.

  1. Obtain the longest non-consecutive repeating substring of two strings based on a dynamic programming table.
  2. Use the substring as a reference to traverse character by character. Take the string comparison of two strings before this character, and give different feedback according to different situations, see the code and comments for details.

Test code:

#include <iostream>
#include <string>
#include <vector>
#include <math.h>
#include <algorithm>

using namespace std;

// 获取最长重复子串(可不连续)
string getSubString(std::string str1, std::string str2) {
	int len1 = int(str1.length());
	int len2 = int(str2.length());

	// 动态规划
	vector<vector<int>> m(len1 + 1, vector<int>(len2 + 1, 0));
	vector<vector<string>> s(len1 + 1, vector<string>(len2 + 1, ""));
	for (int i = 1; i <= len1; ++i){
		for (int j = 1; j <= len2; ++j){
			if (str1[i - 1] == str2[j - 1]){
				m[i][j] = m[i - 1][j - 1] + 1;
				s[i][j] = s[i - 1][j - 1] + str1[i - 1];
			}
			else{
				if (m[i - 1][j] > m[i][j - 1]){
					m[i][j] = m[i - 1][j];
					s[i][j] = s[i - 1][j];
				}
				else{
					m[i][j] = m[i][j - 1];
					s[i][j] = s[i][j - 1];
				}
			}
		}
	}
	return s[len1][len2];
}

// 比较字符串
void compareString(string str1, string str2, string substr) {
	int len1 = int(str1.length());
	int len2 = int(str2.length());
	int len3 = int(substr.length());

	// 以重复子串字符遍历
	// 若截止到第一个重复字符,t1空,t2有,那就是多出;t1有,t2空,那就是缺少。
	// 两个都空说明连续重复字符;若两个都不空,则说明错误。
	// 缺少和错误的状态按位置挨个字符输出。
	int idx1 = 0;
	int idx2 = 0;
	for (int i = 0; i < len3; ++i) {
		string t1, t2;
		int idx3 = idx1;
		int idx4 = idx2;
		// t1为字符串1在当前重复字符前的字符串
		for (int m = idx1; m < len1; ++m) {
			if (str1[m] == substr[i]) {
				t1 = str1.substr(idx1, m - idx1);
				idx1 = m + 1;
				break;
			}
		}
		// t2为字符串2在当前重复字符前的字符串
		for (int n = idx2; n < len2; ++n) {
			if (str2[n] == substr[i]) {
				t2 = str2.substr(idx2, n - idx2);
				idx2 = n + 1;
				break;
			}
		}
		// 根据不同情况处理
		if (t1 == "" && t2 != "") {
			cout << "位置" << idx3 << "多出:" << t2 << endl;
		}
		else if (t1 != "" && t2 == "") {
			while (t1 != "") {
				cout << "位置" << idx3 << "缺少:" << t1[0] << endl;
				idx3++;
				t1 = t1.substr(1, t1.size() - 1);
			}
		}
		else if (t1 == "" && t2 == "") {
		}
		else {
			while (t1 != "") {
				cout << "位置" << idx3 << "错误,应为:" << t1[0] << endl;
				idx3++;
				t1 = t1.substr(1, t1.size() - 1);
			}
		}
	}
	// 对尾巴数据处理
	if (idx1 < len1) {
		string tail = str1.substr(idx1, len1 - idx1);
		while (tail != "") {
			cout << "位置" << idx1 << "错误,应为:" << tail[0] << endl;
			idx1++;
			tail = tail.substr(1, 1);
		}
	}
}

int main() {
	std::string str1 = "1abcdefui";
	std::string str2 = "abxcegfui";
	cout << "字符串1:" << str1 << endl;
	cout << "字符串2:" << str2 << endl;

	// 获取最长重复子串(可不连续)
	string substr = getSubString(str1, str2);
	cout << "substring:" << substr << endl;

	// 比较字符串
	compareString(str1, str2, substr);

	system("pause");
	return 0;
}

Test code:

 

Guess you like

Origin blog.csdn.net/zhaitianbao/article/details/132040229