leetcode30. 串联所有单词的子串

给定一个字符串 s 和一些长度相同的单词 words。找出 s 中恰好可以由 words 中所有单词串联形成的子串的起始位置。
注意子串要与 words 中的单词完全匹配，中间不能有其他字符，但不需要考虑 words 中单词串联的顺序。

示例 1：
输入：
s = “barfoothefoobarman”,
words = [“foo”,“bar”]
输出：[0,9]
解释：
从索引 0 和 9 开始的子串分别是 “barfoor” 和 “foobar” 。
输出的顺序不重要, [9,0] 也是有效答案。
示例 2：
输入：
s = “wordgoodgoodgoodbestword”,
words = [“word”,“good”,“best”,“word”]
输出：[]

暴力匹配用了1700ms多：

class Solution:
    def findSubstring(self, s, words):
        """
        :type s: str
        :type words: List[str]
        :rtype: List[int]
        """
        if not words:
            return []
        len_word, length = len(words[0]), len(words)
        len_words, word_dict, res = len_word * length, {}, []
        for word in words:
            word_dict[word] = word_dict.get(word, 0) + 1
        for i in range(len(s)-len_words+1):
            dict_s = {}  # 求每个点开始的words_dict
            for j in range(length):
                word = s[i+j*len_word:i+(j+1)*len_word]
                dict_s[word] = dict_s.get(word, 0) + 1
            if dict_s == word_dict:
                res.append(i)
        return res

稍微优化一下：

class Solution:
    def findSubstring(self, s, words):
        """
        :type s: str
        :type words: List[str]
        :rtype: List[int]
        """
        if not words:
            return []
        len_word, length = len(words[0]), len(words)
        len_words, word_dict, res = len_word * length, {}, []
        for word in words:
            word_dict[word] = word_dict.get(word, 0) + 1
        for i in range(len(s)-len_words+1):
            dict_s = {}  # 求每个点开始的words_dict
            for j in range(length):
                word = s[i+j*len_word:i+(j+1)*len_word]
                if word not in word_dict:
                    break
                dict_s[word] = dict_s.get(word, 0) + 1
                if dict_s[word] > word_dict[word]:
                    break
            if dict_s == word_dict:
                res.append(i)
        return res

下面再做了一些优化，减少重复计算。先根据起始点不同把s分成一个一个词，如s长度为18，words长度2，每个word长度3，那么可以分为0 3 6 9 12 15，1 4 7 10 13，2 5 8 11 14三段来计算；然后用双指针记录首尾单词的起始位置，和words_dict比较来移动两个指针；如果匹配则写入左指针。

class Solution:
    def findSubstring(self, s, words):
        """
        :type s: str
        :type words: List[str]
        :rtype: List[int]
        """
        if not words:
            return []
        words_count, word_length = len(words), len(words[0])  # 单词个数和每个单词长度
        words_dict, res, s_length = {}, [], len(s)
        for word in words:
            words_dict[word] = words_dict.get(word, 0) + 1
        for i in range(word_length):  # 根据单词长度划分几块
            left, right, count, now_dict = i, i, 0, {}  # 起始单词位置,单词数,单词字典
            while right <= s_length-word_length:
                right_str = s[right:right+word_length]
                if right_str not in words_dict:  # 词不在words中,从right右边重新开始
                    count, now_dict, right = 0, {}, right+word_length
                    left = right
                else:
                    now_dict[right_str] = now_dict.get(right_str, 0) + 1
                    right += word_length
                    count += 1
                    if now_dict[right_str] > words_dict[right_str]:
                        # 某个词多了,所以要把多了的这个词在前面的去掉一个
                        while now_dict[right_str] > words_dict[right_str]:
                            left_str = s[left:left+word_length]
                            now_dict[left_str] -= 1
                            count -= 1
                            left += word_length
                    if count == words_count:  # 如果dict和count都没问题,说明匹配
                        res.append(left)  # 之后left后移一位
                        now_dict[s[left:left+word_length]] -= 1
                        count -= 1
                        left += word_length
        return res

可以减少变量count：

class Solution:
    def findSubstring(self, s, words):
        """
        :type s: str
        :type words: List[str]
        :rtype: List[int]
        """
        if not words:
            return []
        words_count, word_length = len(words), len(words[0])  # 单词个数和每个单词长度
        words_dict, res, s_length = {}, [], len(s)
        for word in words:
            words_dict[word] = words_dict.get(word, 0) + 1
        for i in range(word_length):  # 根据单词长度划分几块
            left, right, now_dict = i, i, {}  # 起始单词位置,单词数,单词字典
            while right <= s_length-word_length:
                right_str = s[right:right+word_length]
                if right_str not in words_dict:  # 词不在words中,从right右边重新开始
                    now_dict, right = {}, right+word_length
                    left = right
                else:
                    now_dict[right_str] = now_dict.get(right_str, 0) + 1
                    right += word_length
                    if now_dict[right_str] > words_dict[right_str]:
                        # 某个词多了,所以要把多了的这个词在前面的去掉一个
                        while now_dict[right_str] > words_dict[right_str]:
                            left_str = s[left:left+word_length]
                            now_dict[left_str] -= 1
                            left += word_length
                    if (right-left)/word_length == words_count:  # 如果dict和count都没问题,说明匹配
                        res.append(left)  # 之后left后移一位
                        now_dict[s[left:left+word_length]] -= 1
                        left += word_length
        return res

leetcode30. 串联所有单词的子串

暴力匹配用了1700ms多：

稍微优化一下：

可以减少变量count：

猜你喜欢