Skip to content

Latest commit

 

History

History
 
 

Ch3

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Text Representation

🔖 Outline

To be added

🗒️ Notebooks

Set of notebooks associated with the chapter.

  1. One-Hot Encoding: Here we demonstrate One-Hot encoding from the first principle as well as scikit learn's implementation on our toy corpus.

  2. Bag of Words : Here we demonstrate how to arrive at the bag of words representation for our toy corpus.    

  3. Bag of N Grams: Here we demonstrate how Bag of N-Grams work using our toy corpus.

  4. TF-IDF: Here we demonstrate how to obtain the get the TF-IDF representation of a document using sklearn's TfidfVectorizer(we will be using our toy corpus).

  5. Pre-trained Word Embeddings: Here we demonstrate how we can represent text using pre-trained word embedding models and how to use them to get representations for the full text.

  6. Custom Word Embeddings: Here we demonstrate how to train a custom Word Embedding model(word2vec) using gensim on both, our toy corpus and a subset of Wikipedia data.

  7. Vector Representations via averaging: Here we demonstrate averaging of Document Vectors using spaCy.

  8. Doc2Vec Model: Here we demonstrate how to train your own doc2vec model.

  9. Visualizing Embeddings Using TSNE: Here we demonstrate how we can use dimensionality reduction techniques such as TSNE to visualize embeddings.

  10. Visualizing Embeddings using Tensorboard: Here we demonstrate how we can visualize embeddings using Tensorboard.

🖼️ Figures

Color figures as requested by the readers.

figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure