Just had a new package published on CRAN...rtrie :).
The rtrie package allows you to quickly create Tries from a list of strings. Example use cases for the package are word searching in games like Boggle or Scrabble or implementation of applications that autocomplete or autocorrect text. Incidentally, besides being great for statistics, R is an excellent language for building algorithms and data structures from a pedagogical standpoint. The language's Scheme influence makes the expression of algorithms straightforward and the plotting capabilities make it easy to visualize the data structures and processing involved. The remainder of this post is intended to illustrate that fact as well as introduce rtrie's capabilities.
To create a new trie from a vector of words:
> library(rtrie)
> trie
The new trie is a list of lists... with each node being a letter. So the list of lists above can be plotted using the data.tree package. This package has a wide range of tree related processing functions... including functionality related to all of the plots shown in this post.
> library(data.tree)
> data_trie - FromListSimple(trie)
> plot(data_trie)
The trie can also be displayed in text form which can be read left to right.
data_trie
> data_trie
levelName
1 Root
2 ¦--a
3 ¦ ¦--b
4 ¦ ¦ °--l
5 ¦ ¦ °--e
6 ¦ °--c
7 ¦ ¦--r
8 ¦ ¦ °--o
9 ¦ ¦ °--s
10 ¦ ¦ °--s
11 ¦ °--t
12 ¦ °--s
13 °--b
14 ¦--a
15 ¦ ¦--b
16 ¦ ¦ °--b
17 ¦ ¦ °--l
18 ¦ ¦ °--e
19 ¦ °--t
20 °--o
21 °--b
22 °--b
23 °--l
24 °--e
To check if a given word is in the tree, call is_word which returns a boolean indicating whether or not the word was found in the trie.
> is_word('across', trie)
[1] TRUE
The following diagram illustrates where the match was made in the trie.
To obtain a list of suggested words based on a prefix, call matching_words which returns a vector of matching words.
> matching_words('ac', trie)
r.o.s.s t t.s
"across" "act" "acts"
Three words are returned, corresponding to the section of the tree where the prefix exists. The section of the tree identified by the function is shown below. Although not clear in the diagram, a termination character exists that allowed the function to return both "act" and "acts".
The highlighted graphs above were created by using data.tree's SetGraphStyle, SetNodeStyle, and SetEdgeStyle functions.
If you are interested in more information, the code is available on Github and the vignette includes additional discussion.