Skip to content

Latest commit

 

History

History
 
 

Ch2

NLP Pipeline

🔖 Outline

Section outline as covered in the book.

  • Data Acquisition
  • Text Extraction and Cleanup
    • HTML Parsing and Cleanup
    • Unicode Normalization
    • Spelling Correction
    • System-Specific Error Correction
  • Pre-Processing
    • Preliminaries
    • Frequent Steps
    • Other Pre-Processing Steps
    • Advanced Processing
  • Feature Engineering
    • Classical NLP/ML Pipeline
    • DL Pipeline
  • Modeling
    • Start with Simple Heuristics
    • Building Your Model
    • Building THE Model
  • Evaluation
    • Intrinsic Evaluation
    • Extrinsic Evaluation
  • Post-Modeling Phases
    • Deployment
    • Monitoring
    • Model Updating
  • Working with Other Languages
  • Case Study
  • Wrapping Up
  • References

🗒️ Notebooks

Set of notebooks associated with the chapter.

  1. Web Scraping using BeautifulSoup: Here we demonstrate to scrape a web page(we use stackoverflow.com here as an example) and parse HTML using bs4 to find and extract relevant information.

  2. Web Scraping using Scrapy : Here we demonstrate how to use scrapy to scrape data from websites and save it using a pipeline.

  3. Text Extraction from Images: Here we demonstrate how we can use py-tesseract to extract text from images.

  4. Common Pre-processing Steps: Here we demonstrate the most commonly performed text pre-processing steps using various libraries.

  5. Data Augmentation: Here we demonstrate data augmentation using nlpaug.

🖼️ Figures

Color figures as requested by the readers.

figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure