AI for Natural Language Processing (NLP) Intro
Overview
We live in an era of so much data – a lot of it is text (emails, tweets, customer tickets, Yelp reviews, product reviews, etc.)
In the field of AI, there is a revolution going on in the past few years. The researchers from companies like Google, Facebook, Microsoft and Baidu has come up with break through algorithms that can understand text data more than ever before.
The applications are wide ranging, including understanding documents, processing customer service tickets and analyzing reviews.
In this course, we will teach how to handle text data and introduce you to modern AI NLP technologies.
What you will learn:
- How to prepare text for machine learning
- Stemming, tokenizing and filtering stop words in text
- Analyzing documents using word-frequency, bag-of-words techniques
- Visualizing text data
- Classic toolsets for text processign: NLTK, Textblob, TF-IDF
- Naive Bayes for text classifications
- Modern techniques for text: Spacy, Word2Vec
- Topic modeling with Gensim
- Neural Network frameworks: Tensorflow & Keras
- NN models for text processing: LSTM, RNN
- Modern NN models for text processing: ELMO, ULMFIT, BERT
Duration:
Four Days
Audience:
Developers, Data analysts, data scientists
Skill level
Introductory to Intermediate
Industry Use Cases Covered
We will study and solve some of most common industry use cases; listed below
- Determining if a text message is a spam (Telco)
- Sentiment analysis of Tweets (Social)
- IMDB Movie ratings and reviews analysis
Prerequisites
- Programming background
- Basic knowledge of Python language and Jupyter notebooks is recommended.
Even if you haven’t done any Python programming, Python is such an easy language to learn quickly. We will provide Python resources.
Lab environment
- Cloud based lab environment will be provided to students, no need to install anything on the laptop
Students will need the following
- A reasonably modern laptop with unrestricted connection to the Internet. Laptops with overly restrictive VPNs or firewalls may not work properly
- Chrome browser
Detailed Course Outline
Machine Learning Overview
- Machine Learning landscape
- Understanding AI use cases
- Data and AI
- AI vocabulary
- Hardware and software ecosystem
- Understanding types of Machine Learning (Supervised / Unsupervised / Reinforcement)
Text Preparation
- Filtering
- Stopwords
- Stemming
- Parsing and tokenization
- Word-clouds
- Working with Unicode
Text Algorithms
- N-grams
- Bag-of-words
- NLTK
- TextBlob
- TF-IDF
Text Classification
- Naive Bayes
- SVM
Text datasets and Benchmarks
- Public text datasets
- Benchmarks (GLUE, SQUAD)
Topic Modeling
- LDA (Latent Dirichlet Allocation)
- Gensim
Introduction to Neural Networks
- Perceptrons
- Feedforward networks
- Activation functions
- Optimizers
- Backpropagation
- Deep Neural Networks
Tensorflow
- TensorFlow intro
- TensorFlow features
- TensorFlow on GPU and TPU
- TensorFlow API
- Lab: Setting up and Running TensorFlow
NLP and Deep Learning
- Word embeddings
- Skipgram
- Training the model
- Visualizing the embeddings
- Word2Vec
- SpaCy for named entity recognition
Recurrent Neural Networks (RNN)
- Introduction to RNNs
- Text prediction
- Named entity extraction
- Automatic translation (seq2seq)
- Text generation
Transformers
- Attention concept
- Transformer architecture
- Bidirectional LSTM
- Pre-trained Models for Text Processing (ElMO, ULMFIT, BERT)
Conversational AI
- Understanding natural language
- Generating natural language
- Introduction to RASA framework
Final Workshop (Time Permitting)
- This a group exercise
- Students will use the learned techniques to solve a real world problem
- And present their solutions to the class
- Discussions and Takeaways