Skip to content

Latest commit

 

History

History
 
 

Ch8

Social Media

🔖 Outline

To be added

🗒️ Notebooks

Set of notebooks associated with the chapter.

  1. Create a wordcloud: How to create a word cloud. This is often used to get a quick sense of given text corpus at hand.

  2. Effect of different tokenizers on Social Media Text Data : Here we show how different tokenizers can give different output for the same input text. When dealing with text data from social platforms this can have a huge bearing on the performance of the task.  Here, we will be working with 5 different tokenizers, namely:

    * word_tokenize from NLTK     * TweetTokenizer from NLTK     * Twikenizer     * Twokenizer by ARK@CMU     * twokenize    

  1. Trending topics: Find trending topics on Twitter using tweepy

  2. Sentiment Analysis: Basic sentiment analysis using TextBlob

  3. Preprocessing Social Media Text Data: Common functions involved in the pre-processing pipeline for Social Media Text Data.

  4. Text representation of Social Media Text Data: How to use embeddings to represent Social Media Text Data

  5. Sentiment Analysis:  Here we use the preprocessing and representation steps learnt before to build a better classifier. 

🖼️ Figures

Color figures as requested by the readers.

figure figure figure figure figure figure figure figure figure figure figure figure figure figure