Monday, 20 July 2015

TRIE datastructure explanation and simplified dictionary implementation in Java.

What is TRIE datastructure?


Trie is the data structure very similar to Binary Tree.
Trie datastructure stores the data in particular fashion, so that retrieval of data became much faster and helps in performance.
The name "TRIE" is coined from the word retrieve.

What are TRIE data structure usage or applications?


1. Dictionary Suggestions OR Autocomplete dictionary

Retrieving data stored in Trie data structure is very fast, so it is most suited for application where retrieval are more frequently performed like Phone directory where contact searching operation is used frequently.

2. Searching Contact from Mobile Contact list OR Phone Directory
Auto suggestion of words while searching for anything in dictionary is very common.
If we search for word "tiny", then it auto suggest words starting with same characters like "tine", "tin", "tinny" etc.

Auto suggestion is very useful and Trie plays a nice role there, Lets see real time use.
If say, Person doesn't know the complete spelling of the some word but know few, then rest of words starting with few characters can be auto suggested using TRIE data structure.

How TRIE datastructure works?


Each word of a English language is made using 26 alphabets. Trie data structure uses this property as a base for storing words.
 

A word can be started (first character of a word) from either a, either b, either c .... or either z.
So there are total 26 possibilities, from which first character of a word will be started.

Second character can also be started from either a, either b, either c .... or either z. 
So there are total 26 possibilities, from which second character of a word will be started.
 

Same for Third character and so on.

So for storing a word, we need to take an array(container) of size 26 and initially all the characters are blank as there are no words and it will look as shown below,


Lets see how a word "hello" and "him" is stored, 

1. "hello"
    "hello" starts from "h", So we will mark the position "h" as filled, which represents the use of
     "h". 

      After placing first character, for 2nd character again there are 26 possibilities, So from "h",
      again there is array of size 26, for storing 2nd character.



      Second character is "e", So from "h", we will move to "e" and mark "e" in 2nd array as used.
      After "e", 3rd character is "l", So mark the position "l" as used in respective array.

      Repeat the above steps for all the characters of a word and Trie will look like below.



2. "him"
    "him" starts from "h", So we will mark the position "h" as filled, but "h" is already filled by
      word "hello"which indicates that there exist some other word starting with "h".
     
      Using the same position "h" as a reference, Now, for 2nd character "i",

      We will move to "i" and mark position "i" in 2nd array as used.
      This signifies, that there exist 2 words starting with "he" and "hi".
     
       Now comes 3rd character "m" after "i", Now, we need to start new 26 array alphabet set again,
       which will hold the series starting from "hi".
      
       So mark the position "m" as used in respective array.

      After storing "hello" and "him",
Trie will look like below.

 
NOTE:
When we store the last character of a word, we need to mark that position as "END" to signify that this is the end of one word.
So that, while reading we will come to know that there exist some word which starts at "h" and ends at character "o" in case of "hello".



See figure below for more words.



Lets see java implementation of TRIE data structure.


TRIE Container class
package javabypatel.trie;

public class TrieContainer{
 public boolean isEnd;
 public TrieContainer[] series = new TrieContainer[26];
} 
TRIE Operation class

Operation it supports are,
  1.   Insert word in TRIE datastructure, 
  2.  Search word exist in TRIE datastructure, 
  3.  Print all the words present in TRIE datastructure.
package javabypatel.trie;

public class Trie {

 public static void main(String[] args) {
  TrieContainer start = new TrieContainer();
  
  storeWords(start, "hello");
  storeWords(start, "hallo");
  storeWords(start, "hell");
  storeWords(start, "teg");
  storeWords(start, "tag");
  
  printWordStrings(start,"");
  
  System.out.println("\n"+isWordPresent(start, "teg"));
 }

 private static void storeWords(TrieContainer start, String word){
  for (int j = 0; j < word.length(); j++) {
   char character = word.charAt(j);

   //In series, check the position of character, 
   //if it is already filled then check the series of filled Trie object.
   //if it is not filled then create new TrieContainer object and place it at correct position, and check 
   //if it is end of the word, then mark isEnd = true or else false;  
   if(start.series[character-97]!=null){
    if(word.length()-1 == j){ //if word is found till last character, then mark the end as true.
     start.series[character-97].isEnd = true;
    }
    start = start.series[character-97];
   }else{
    TrieContainer trie = new TrieContainer();
    trie.isEnd = (word.length()-1 == j ? true:false);
    start.series[character-97] = trie;
    start = start.series[character-97];
   }
  }  
 }
 
 private static boolean isWordPresent(TrieContainer start, String word){
  boolean isFound = true;
  for (int i = 0; i < word.length(); i++) {
   char character = word.charAt(i);

   //if at character position TrieContainer object is present then character is found and 
   //start looking for next character is word. 
   if(start.series[character-97]!=null){
    if(word.length()-1 != i){
     start = start.series[character-97];
    }else{
     if(!start.series[character-97].isEnd){
      isFound = false;
     }
    }
   }else{
    isFound = false;
    break;
   }
  }
  return isFound;
 }

 private static void printWordStrings(TrieContainer start, String toPrint) {
  if(start==null){
   return;
  }
  if(start.isEnd){
   System.out.println(toPrint);
  }

  for (int i = 0; i < start.series.length; i++) {
   TrieContainer t = start.series[i];
   if(t!=null){
    printWordStrings(t, toPrint + (char)(97+i));
   }
  }
 }
}


Place N Queens on N×N chess board such that no two Queens attack each other.

0/1 Knapsack Problem solved using Iterative and Dynamic Programming

Stock Buy Sell to Maximize Profit.

Find Minimum length Unsorted Subarray, Sorting which makes the complete array sorted

Count trailing zeros in factorial of a number.

How to Detect loop in linked list.

Tower Of Hanoi Program in Java.

Print Matrix Diagonally OR Diagonal order of Matrix.

Transpose of Matrix Inplace

Implement Queue using One Stack in Java (Recursive Implementation)

SOAP Vs REST Web Service, when to use SOAP, When to use REST.


Enjoy !!!!
If you find any issue in post or face any error while implementing, Please comment.

No comments:

Post a Comment