Trie (Prefix Tree)
#string #tree
- 3 minutes read - 434 words
Notebook
Trie
Trie, also called prefix tree, is a data structure that stores and retrieves overlapping information in an efficient manner. Mostly used for a set of strings, every leaf node in this tree represents a unique element and the every node in the path path between root to that leaf contains one atomic unit of that element, so in case of string a character.
Trie Node
Every node in the tree needs to hold three kinds of information.
- The value of the atomic unit of an element. In case of string a character
- List of all the children trie node
- In some cases the non-leaf node can represent the element/string as well if one element is the prefix of another. To represent that we need to store whether a non-leaf node represents an element as well.
A helper function is required as well to tell whether a unit is part of the node’s children get_child
, the routine will return the child node, if present.
class TrieNode:
def __init__(self, val):
self.val = val
self.children = []
self.isElement = False
def get_child(self, unit):
for node in self.children:
if node.val == unit:
return node
return None
def __str__(self):
return "TrieNode(val={0}, isElement={1})".format(self.val, self.isElement)
Root
The root of the trie tree represents the empty element set. Initializing means creating a node that stores nothing None
and no children.
root = TrieNode(None)
Insert
Adding an element to the tree is pretty straightforward. Starting from the root the steps are to find the prefix of the element which is already in the tree and add the rest of the element as a linked list.
def create_chain(element):
head = None
tail = None
for unit in element:
node = TrieNode(unit)
if head == None:
head = tail = node
else:
tail.children.append(node)
tail = node
tail.isElement = True
return head
def insert(element, root):
node = root
for i in range(len(element)):
childnode = node.get_child(element[i])
if childnode:
node = childnode
else:
node.children.append(create_chain(element[i:]))
return
node.isElement = True
Search
Searching an element involves going from root towards the leaf matching individual units of element. In the end if the algo returns a node then it is present, and if the isElement
flag is true then it’s not just the prefix.
def search(element, root):
node = root
for i in range(len(element)):
childnode = node.get_child(element[i])
if childnode:
node = childnode
else:
return None
return node
Run
Let’s give the code so far a test run on an example that saves a bunch of strings in the trie.
words = ["abc", "abcd", "xyz", "pqrs", "mno", "mnop"]
for word in words:
insert(word, root)
print(search("jkl", root))
print(search("ab", root))
print(search("abcd", root))