jan 20, 2016DATA_ADSENSE
This app was developed for the Capstone project leading to the data science certification provided by Johns Hopkins University through Coursera. Its purpose is to:
Please allow a few seconds for the first prediction: the datasets are being loaded and cached.
To build the training dataset we:
The first algorithm the app goes through is the Maximum Likelihod Estimation algorithm.
The MLE algorithm:DATA_ADSENSE
Evaluates the sentence entered by the user to qualify the type of ngram we are dealing with.
Takes the ngram+1 dataset and looks for ngrams+1 starting with the ngram entered by the user.
Provides to the next word and its associated probability calculated as the frequency of occurence of the next word in the ngram+1 in the dataset divided by the number of ngram+1 occurences.
Prompts a message to the user in case of failure.
The second algorithm the app goes through is a Back-off algorithm.
The Back-off algorithm:
Evaluates the sentence entered by the user and extracts the last bigram
It then works looks for the bigram into the trigram dataset
In case of failure to find the next word:
The following phrases have been tested successfully:
Possible enhancements we would foresee given more time would be: