A Trie data structure acts as a container for a dynamic array. In this article, we shall look at how we can implement a Trie in C/C++.
This is based on the tree data structure but does not necessarily store keys. Here, each node only has a value, which is defined based on the position.
So, the value of each node denotes the prefix, because it is the point from the root node after a series of matches.
We call such matches as prefix matches. Therefore, we can use Tries to store words of a dictionary!
For example, in the above figure, the root node is empty, so every string matches the root node. Node 7 matches a prefix string of to, while node 3 matches a prefix string of tea.
Here, the keys are only stored in the leaf node positions, since prefix matching stops at any leaf node. So any non-leaf node does NOT store the whole string, but only the prefix match character.
Tries are called prefix trees for all these reasons. Now let’s understand how we can implement this data structure, from a programmer’s point of view.
Let’s first write down the Trie structure. A Trie Node has notably two components:
But, since we’ll be printing the Trie too, it will be easier if we can store one more attribute in the data part.
So let’s define the TrieNode structure. Here, since I will be storing only the lower case alphabets, I will implement this as a 26-ary-tree. So, each node will have pointers to 26 children.
Now that we’ve defined our structure, let’s write functions to initialize a TrieNode in memory, and also to free it’s pointer.
We’ll now write the insert_trie()
function, that takes a pointer to the root node (topmost node) and inserts a word to the Trie.
The insertion procedure is simple. It iterates through the word character by character and evaluates the relative position.
For example, a character of b
will have a position of 1, so will be the second child.
We will match the prefix character by character, and simply initialize a node if it doesn’t exist.
Otherwise, we simply keep moving down the chain, until we have matched all the characters.
Finally, we will have inserted all unmatched characters, and we will return back the updated Trie.
Now that we’ve implemented insertion onto a Trie, let’s look at searching for a given pattern.
We’ll try to match the string character by character, using the same prefix matching strategy as above.
If we have reached the end of the chain and haven’t yet found a match, that means that the string does not exist, as a complete prefix match is not done.
For this to return correctly, our pattern must exactly match, and after we finish matching, we must reach a leaf node.
Okay, so we’ve completed the insert
and search
procedures.
To test it, we’ll first write a print_tree()
function, that prints the data of every node.
Now that we have done that, let’s just run the complete program until now, to check that it’s working correctly.
After compiling with your gcc
compiler, you’ll get the following output.
While it may be obvious how it’s being printed, the clue is to look at the print_tree()
method. Since that prints the current node, and then all of its children, the order does indicate that.
So, now let’s implement the delete
method!
This one is actually a bit more tricky than the other two methods.
There doesn’t exist any format algorithm of sorts since there is no explicit way in which we store the keys and values.
But, for the purpose of this article, I’ll handle deletions only if the word to be deleted is a leaf node. That is, it must be prefix matched all the way, until a leaf node.
So let’s start. Our signature for the delete function will be:
As mentioned earlier, this will delete only if the prefix match is done until a leaf node. So let’s write another function is_leaf_node()
, that does this for us.
Okay, so with that completed, let’s look at the draft of our delete_trie()
method.
After this check is done, we now know that our word will end in a leaf node.
But there is another situation to handle. What if there exists another word that has a partial prefix match of the current string?
For example, in the below Trie, the words tea and ted have a partial common match until te.
So, if this happens, we cannot simply delete the whole sequence from t to a, as then, both the chains will be deleted, as they are linked!
Therefore, we need to find the deepest point until which such matches occur. Only after that, can we delete the remaining chain.
This involves finding the longest prefix string, so let’s write another function.
This will return the longest match in the Trie, which is not the current word (word
). The below code explains every intermediate step in the comments.
There is another twist here! Since we try to find the longest match, the algorithm will actually greedily match the original string itself! This is not what we want.
We want the longest match that is NOT the input string. So, we must have a check with another method check_divergence()
.
This will check if there is any branching from the root to the current position, and return the length of the best match which is NOT the input.
If we are in the bad chain, that corresponds to the original string, then there will be no branching from the point! So this is a good way for us to avoid that nasty point, using check_divergence()
.
Finally, we’ve ensured that we don’t just delete the whole chain blindly. So now let’s move on with our delete
method.
The above code simply finds the deepest node for the prefix match and deletes the remaining sequence matching word from the Trie, since that is independent of any other match.
The Time Complexity of the procedures are mentioned below.
Procedure | Time Complexity of Implementation |
search_trie() |
O(n) -> n is the length of the input string |
insert_trie() |
O(n) -> n is the length of the input string |
delete_trie() |
O(C*n) -> C is the number of alphabets, |
n is the length of the input word |
For almost all cases, the number of alphabets is a constant, so the complexity of delete_trie()
is also reduced to O(n).
At long last, we’ve (hopefully), completed our delete_trie()
function. Let’s check if what we’ve written was correct, using our driver program.
Output
With that, we’ve come to the end of our Trie Data Structure implementation in C/C++. I know that this is a long read, but hopefully you’ve understood how you can apply these methods correctly!
You can find the complete code in a Github Gist that I’ve uploaded. It may not be the most efficient code, but I’ve tried my best to ensure that it’s working correctly.
If you have any questions or suggestions, do raise them in the comment section below!
If you’re interested in similar topics, you can refer to the Data Structures and Algorithms section, which includes topics like Hash Tables and Graph Theory.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
How can i change the STRUCT wich has a “children array” with 26 spaces to a STRUCT more efficient, talking about memory use?
- Lucas