Trie树（字典树）的C++实现

1 Trie树

Trie树，又称字典树、单词查找树、前缀树，是一种哈希树的变种，应用于字符串的统计与排序，经常被搜索引擎系统用于文本词频统计。优点是查询快，利用字串的公共前缀来节省存储空间，最大限度的减少无谓的字串比较。对于长度为m的键值，最坏情况下只需花费O(m)的时间；而BST需要O(mlogn)的时间。

本文用C++实现了字典树的部分功能，插入、查找单词、查找前缀，其余有时间再补充。注意：

根节点不包含字符，除根节点外的每一个节点都只对应一个字符；
这里假设存储的字符均为 'a'-'z'，所以这里对字符的存储并不是存储字符本身，而是存储相对位置，如果该位置的指针为空，则说明此处没有字母；反之有字母；
terminableNum存储以此结点为结束结点的个数，这样可以避免删除时，不知道是否有多个相同字符串的情况；
对树的删除，只需要delete根结点，其余会通过结点自身的析构函数实现；
具体结构可以根据需要来裁剪。

2 Trie树的实现

LeetCode 208.实现Trie(前缀树)

//定义结点类型
struct trieNode{
    int terminalSize = 0;       //存储以此结点为结尾的字串的个数
    trieNode* next[26] = {NULL};   //该数组记录指向各孩子的指针
    trieNode(){}
    ~trieNode(){ for(int i=0; i<26 && next[i]!=NULL; ++i){delete next[i];} }
};

//定义Trie树类型
class Trie {
    trieNode* root;
public:
    Trie():root(new trieNode()){}
    ~Trie(){ delete root; }  
    
    //先定义三个功能函数
    //插入单词
    void insert(const string& word) {
        if(word.size() == 0) return; //若word为空，直接返回
        trieNode* T = root;
        for(char c : word){
            int idx = c - 'a'; //用相对顺序表示第几个儿子
            if(T->next[idx] == NULL){ //若这个儿子不存在，则新添之
                T->next[idx] = new trieNode();
            }
            T = T->next[idx];
        }
        T->terminalSize += 1; //更新这个单词的数量
        return;
    }
    
    //查找单词
    bool search(const string& word) {
        trieNode* T = root;
        for(char c : word){
            if(T->next[c-'a'] == NULL) return false;
            T = T->next[c-'a'];
        }
        if(T->terminalSize > 0) return true;
        else return false;
    }
    
    //查找前缀，同查找单词，区别是不需要判断T->terminalSize是否大于0
    bool startsWith(const string& prefix) {
        trieNode* T = root;
        for(char c : prefix){
            if(T->next[c-'a'] == NULL) return false;
            T = T->next[c-'a'];
        }
        return true;
    }
};

3 Trie树的应用

LeetCode 820.单词的压缩编码

//定义trie树的结点类型
struct trieNode{
    trieNode* next[26] = {NULL};   //该数组记录指向各孩子的指针
    trieNode(){}
    ~trieNode(){ for(int i=0; i<26 && next[i]!=NULL; ++i){delete next[i];} }
};
//定义Trie树
class Trie {
    trieNode* root; //Trie树的根结点
public:
    Trie():root(new trieNode()){}
    ~Trie(){ delete root; }  
    //插入一个单词，若该单词是Trie树中某个单词的前缀，则返回false
    bool insert(const string& word) {
        if(word.size() == 0) return false; //若word为空，直接返回
        trieNode* T = root;
        bool success = false; //确定是否插入成功
        for(char c : word){
            if(T->next[c-'a'] == NULL){ //若这个儿子不存在，则新添之
                T->next[c-'a'] = new trieNode();
                success = true; 
            }
            T = T->next[c-'a']; //更新T结点
        }
        return success;
    }
};

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        if(words.size() == 0) return 0;
        //定义比较函数，并按照单词长度从大到小排序。注意不可使用>=号
        auto cmp = [](const string& s1, const string& s2){return s1.size() > s2.size();};
        sort(words.begin(), words.end(), cmp);
        //将各个单词分别插入Trie树，若插入成功，则累加长度
        Trie tree;
        int res = 0;
        for(string& word : words){
            reverse(word.begin(), word.end()); //因为需要匹配后缀，故将每个单词逆序插入
            if(tree.insert(word)) res += word.size() + 1;  //若插入成功，则计入长度
        }
        return res;
    }
};

注意：

定义匿名函数 auto cmp = ... 时可以用auto关键字，cmp是一个函数指针(详情请见这里4.1)，而不是bool；
自定义比较函数必须要满足 “Strict_weak_orderings”三点，所以不能写>=，否则违反第二条
- For all a, comp(a,a) == false
- If comp(a,b) == true then comp(b,a) == false
- if comp(a,b) == true and comp(b,c) == true then comp(a,c) == true

Trie树（字典树）的C++实现

1 Trie树

2 Trie树的实现

3 Trie树的应用

猜你喜欢