To be added
Set of notebooks associated with the chapter.
-
One-Hot Encoding: Here we demonstrate One-Hot encoding from the first principle as well as scikit learn's implementation on our toy corpus.
-
Bag of Words : Here we demonstrate how to arrive at the bag of words representation for our toy corpus.
-
Bag of N Grams: Here we demonstrate how Bag of N-Grams work using our toy corpus.
-
TF-IDF: Here we demonstrate how to obtain the get the TF-IDF representation of a document using sklearn's TfidfVectorizer(we will be using our toy corpus).
-
Pre-trained Word Embeddings: Here we demonstrate how we can represent text using pre-trained word embedding models and how to use them to get representations for the full text.
-
Custom Word Embeddings: Here we demonstrate how to train a custom Word Embedding model(word2vec) using gensim on both, our toy corpus and a subset of Wikipedia data.
-
Vector Representations via averaging: Here we demonstrate averaging of Document Vectors using spaCy.
-
Doc2Vec Model: Here we demonstrate how to train your own doc2vec model.
-
Visualizing Embeddings Using TSNE: Here we demonstrate how we can use dimensionality reduction techniques such as TSNE to visualize embeddings.
-
Visualizing Embeddings using Tensorboard: Here we demonstrate how we can visualize embeddings using Tensorboard.
Color figures as requested by the readers.