652. Find Duplicate Subtrees
Given the root of a binary tree, return all duplicate subtrees.
For each kind of duplicate subtrees, you only need to return the root node of any one of them.
Two trees are duplicate if they have the same structure with the same node values.
Example 1:
Input: root = [1,2,3,4,null,2,4,null,null,4]
Output: [[2,4],[4]]
Example 2:
Input: root = [2,1,1]
Output: [[1]]
Example 3:
Input: root = [2,2,2,3,null,3,null]
Output: [[2,3],[3]]
Constraints:
- The number of the nodes in the tree will be in the range [1, 5000]
- -200 <= Node.val <= 200
From: LeetCode
Link: 652. Find Duplicate Subtrees
Solution:
Ideas:
1. Serializing Subtrees:
- For each subtree, serialize it into a string format like “2,4,#,#,#” representing the structure and values.
- # is used to mark NULL children to preserve structure.
- Format is: rootVal,leftSubtree,rightSubtree.
2. Tracking Subtree Occurrences:
- Use a hash map to store serialized strings as keys.
- Each entry keeps:
- The serialization string.
- A count of how many times it has occurred.
- The corresponding TreeNode pointer (subtree root).
- Hash collisions are handled using chaining (linked list at each bucket).
3. Detecting Duplicates:
- While traversing the tree via DFS, serialize the current subtree.
- Look up the serialization in the hash map.
- If it has been seen once before (count == 1), it’s a duplicate — save the TreeNode.
- Only add a duplicate once (i.e., when count becomes 2).
4. Avoiding Use-After-Free Errors:
- Since serializations are stored in the hash map, we must not free them prematurely.
- We return a str_dup() (safe copy) from the DFS so that each recursion level can manage its own copy independently.
- Strings passed to the hash map are not freed.
5. Building the Result:
- Each time a duplicate subtree is found (when count reaches 2), its TreeNode is added to a dynamic result array.
- The result is returned with its size through *returnSize.
6. Key Memory Practices:
- Free temporary strings used during DFS after they are no longer needed.
- Do not free serialization strings stored in the hash map.
- Each recursive dfs() returns a duplicate string that is later freed by the parent call.
Code:
/**
* Definition for a binary tree node.
* struct TreeNode {
* int val;
* struct TreeNode *left;
* struct TreeNode *right;
* };
*/
/**
* Note: The returned array must be malloced, assume caller calls free().
*/
// Hash map node
typedef struct TreeNodeEntry {
char *serial; // Serialized string
int count; // Number of times seen
struct TreeNode *node; // Tree node
struct TreeNodeEntry *next;
} TreeNodeEntry;
#define HASH_SIZE 10007
TreeNodeEntry* hashmap[HASH_SIZE];
struct TreeNode** result;
int resultCap = 0;
int resultSize = 0;
// djb2 hash function
unsigned int hash(const char* str) {
unsigned long h = 5381;
int c;
while ((c = *str++))
h = ((h << 5) + h) + c;
return h % HASH_SIZE;
}
void addResult(struct TreeNode* node) {
if (resultSize == resultCap) {
resultCap = resultCap == 0 ? 4 : resultCap * 2;
result = realloc(result, sizeof(struct TreeNode*) * resultCap);
}
result[resultSize++] = node;
}
char* str_dup(const char* s) {
char* dup = malloc(strlen(s) + 1);
strcpy(dup, s);
return dup;
}
// Serialize and process
char* dfs(struct TreeNode* root) {
if (!root) {
return str_dup("#"); // Unique string for null
}
char* left = dfs(root->left);
char* right = dfs(root->right);
int len = snprintf(NULL, 0, "%d,%s,%s", root->val, left, right);
char* serial = malloc(len + 1);
sprintf(serial, "%d,%s,%s", root->val, left, right);
free(left);
free(right);
unsigned int h = hash(serial);
TreeNodeEntry* entry = hashmap[h];
while (entry) {
if (strcmp(entry->serial, serial) == 0) {
entry->count++;
if (entry->count == 2) {
addResult(entry->node);
}
free(serial); // Don't need this copy, use stored one
return str_dup(entry->serial); // Return safe copy
}
entry = entry->next;
}
// Not found — insert new
TreeNodeEntry* newEntry = malloc(sizeof(TreeNodeEntry));
newEntry->serial = serial;
newEntry->count = 1;
newEntry->node = root;
newEntry->next = hashmap[h];
hashmap[h] = newEntry;
return str_dup(serial); // Return a safe copy
}
struct TreeNode** findDuplicateSubtrees(struct TreeNode* root, int* returnSize) {
memset(hashmap, 0, sizeof(hashmap));
result = NULL;
resultSize = 0;
resultCap = 0;
char* temp = dfs(root);
free(temp); // root's copy not needed
*returnSize = resultSize;
return result;
}