Section outline as covered in the book.
- Data Acquisition
- Text Extraction and Cleanup
- HTML Parsing and Cleanup
- Unicode Normalization
- Spelling Correction
- System-Specific Error Correction
- Pre-Processing
- Preliminaries
- Frequent Steps
- Other Pre-Processing Steps
- Advanced Processing
- Feature Engineering
- Classical NLP/ML Pipeline
- DL Pipeline
- Modeling
- Start with Simple Heuristics
- Building Your Model
- Building THE Model
- Evaluation
- Intrinsic Evaluation
- Extrinsic Evaluation
- Post-Modeling Phases
- Deployment
- Monitoring
- Model Updating
- Working with Other Languages
- Case Study
- Wrapping Up
- References
Set of notebooks associated with the chapter.
-
Web Scraping using BeautifulSoup: Here we demonstrate to scrape a web page(we use stackoverflow.com here as an example) and parse HTML using bs4 to find and extract relevant information.
-
Web Scraping using Scrapy : Here we demonstrate how to use scrapy to scrape data from websites and save it using a pipeline.
-
Text Extraction from Images: Here we demonstrate how we can use py-tesseract to extract text from images.
-
Common Pre-processing Steps: Here we demonstrate the most commonly performed text pre-processing steps using various libraries.
-
Data Augmentation: Here we demonstrate data augmentation using nlpaug.
Color figures as requested by the readers.