Skip to content

Commit b10c3c6

Browse files
committed
README updated with methods description and USAGE.
1 parent d85bc28 commit b10c3c6

File tree

1 file changed

+43
-1
lines changed

1 file changed

+43
-1
lines changed

README.md

+43-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,43 @@
1-
# NLP_CODE
1+
#NLP_CODE
2+
3+
This repository contains a single directory named sentence_autocomplete which is an assignment submission and is purely for academic purposes.
4+
Code description:
5+
The code file contains a class named "MarkovChain" which contains all the essential methods for creating markov chain using trigrams from the data loaded from a output file which is inturn generated using an original input file containing raw tweets.
6+
7+
Method wise description:
8+
Class MarkovChain
9+
Method : initialize(input_file)
10+
Takes in a name of the file containing raw tweets and calls a method named "clean_data_and_save_to_file" which
11+
is in a module named DataPreprocessing in the same code file. This method "clean_data_and_save_to_file" removes unwanted material like special characters except apostrophe and urls from the tweets and writes back
12+
the processed tweets to an output file. The initialize method then loads this file and read it line by line to
13+
produce trigrams from the line and calls the "add" function to add the trigrams to the trigram hash(@word).
14+
15+
Method : add(word, word1, word2)
16+
Takes in three words and add the first two words as key in the dictionary(hash) and the third word as a hash
17+
this key with value as the frequency of the occurence of this trigram.
18+
19+
Method : get_possible_word(bigram)
20+
This method takes in two word combination and look up to the dictionary we created in "add" method and finds
21+
all the keys under the main bigram. Then it calculates the weight of all the keys under this bigram and return
22+
the key with highest probability or frequency of occurence.
23+
Note that this method returns only one word.
24+
25+
Method : print_dict
26+
This method simply prints out the trigram dictionary @word.
27+
28+
29+
USAGE:
30+
Make an object of the class named "MarkovChain" and pass the input raw data text file.
31+
markov_obj = MarkovChain.new(<in_file>)
32+
Pass last two words of an incomplete sentence to the method named "get_possible_word"
33+
str = "American sniper is directed by"
34+
str_list = str.chomp.strip.split
35+
wrd1 = str_list[-2]
36+
wrd2 = str_list[-1]
37+
search_string = "#{wrd1} #{wrd2}"
38+
next_word = markov_obj.get_possible_word(search_string)
39+
40+
To get the second possible word just pass the next_word generated before and the last word of the original incomplete sentence
41+
to the "get_possible_method" again.
42+
43+

0 commit comments

Comments
 (0)