Artificial Intelligence (AI) for Natural Language Processing (NLP) – Introduction

Overview

We live in an era of so much data – a lot of it is text (emails, tweets, customer tickets, Yelp reviews, product reviews, etc.)

In the field of AI, there is a revolution going on in the past few years. Researchers from companies like Google, Facebook, Microsoft, and Baidu has come up with breakthrough algorithms that can understand text data more than ever before.

The applications are wide-ranging, including understanding documents, processing customer service tickets, and analyzing reviews.

In this course, we will teach how to handle text data and introduce you to modern AI NLP technologies.

What you will learn

  • How to prepare text for machine learning
  • Stemming, tokenizing, and filtering stop words in text
  • Analyzing documents using word-frequency, bag-of-words techniques
  • Visualizing text data
  • Classic toolsets for text processing: NLTK, Textblob, TF-IDF
  • Naive Bayes for text classifications
  • Modern techniques for text: Spacy, Word2Vec
  • Topic modeling with Gensim
  • Neural Networks and Deep Learning
  • Deep learning models for text processing: LSTM, RNN
  • Transformer architecture
  • Modern NN models for text processing: ELMO, ULMFIT, BERT
  • Text generation with Tensorflow

Duration

Three Days

Audience

Developers, Data analysts, data scientists

Skill level

Introductory to Intermediate

Use Cases Covered

We will study and solve some of the most common industry use cases; listed below

  • Determining if a text message is a spam (Telco)
  • Sentiment analysis of text data (Social/News)
  • IMDB Movie ratings and reviews analysis
  • Classifying news articles
  • Extracting topics mentioned in text

Prerequisites

  • Programming background
  • Basic knowledge of Python language and Jupyter notebooks is recommended.
    Even if you haven’t done any Python programming, Python is such an easy language to learn quickly. We will provide Python resources.

Lab environment

  • Cloud-based lab environment will be provided to students, no need to install anything on the laptop
  • We encourage the use of Google Colab environment for ease of use and free GPU access

Students will need the following

  • A reasonably modern laptop with unrestricted connection to the Internet. Laptops with overly restrictive VPNs or firewalls may not work properly
  • Chrome browser

Detailed Course Outline

Machine Learning Overview

  • Machine Learning landscape
  • Understanding AI use cases
  • Data and AI
  • AI vocabulary
  • Hardware and software ecosystem
  • Understanding types of Machine Learning (Supervised / Unsupervised / Reinforcement)

Text Preparation

  • Filtering
  • Stopwords
  • Stemming
  • Parsing and tokenization
  • Word-clouds
  • Working with Unicode
  • Lab

Text Algorithms Overview

  • N-grams
  • Bag-of-words
  • NLTK
  • TextBlob
  • Vectorizing text
  • TF-IDF
  • Lab

Text Classification

  • Naive Bayes
  • SVM
  • Lab

Text datasets and Benchmarks

  • Public text datasets
  • Benchmarks (GLUE, SQUAD)

Topic Modeling

  • LDA (Latent Dirichlet Allocation)
  • Gensim
  • Lab

Introduction to Neural Networks

  • Perceptrons
  • Neural networks design
  • Deep Neural Networks, hidden layers
  • Training neural networks
  • Backpropagation
  • Neural network architectures: Feed forward, Convolutional, Recurrent
  • Labs: Neural network playground

Tensorflow

  • TensorFlow intro
  • TensorFlow features
  • TensorFlow on GPU and TPU
  • TensorFlow API
  • Lab: Setting up and Running TensorFlow

NLP and Deep Learning

  • Word embeddings
  • Skipgram
  • Training the model
  • Visualizing the embeddings
  • Word2Vec
  • SpaCy for named entity recognition
  • Lab

Text Processing in TensorFlow

  • Introduction to RNNs
  • Introduction to LSTMs
  • Sequence to Sequence models
  • Text prediction
  • Text generation
  • Lab

Transformers

  • Attention concept
  • Transformer architecture
  • Pre-trained Models for Text Processing (ElMO, ULMFIT, BERT)
  • Lab

Conversational AI

  • Understanding natural language
  • Generating natural language
  • Introduction to RASA framework

Final Workshop (Time Permitting)

  • This a group exercise
  • Students will use the learned techniques to solve a real-world problem
  • And present their solutions to the class
  • Discussions and Takeaways