Tensorflow2-GPU (Nvidia) + Ubuntu 18.04 + Anaconda + Jupyter - Setup guide

This guide is verified as of 2020 May, with Tensorflow version 2.1.0

Background

I recently got GPU version of Tensorflow working on my ubuntu machine.
It took a lot of effort, a lot of Googling and a lot of experimenting.
Here is the final setup to help out anyone who is looking to do the same.
Updated : 2020-May-20

Setup

Hardware : Nvidia RTX 2070 8GB (see available products on Amazon)

Software Stack:

Ubuntu 18.04
Nvidia drivers + CUDA
Anaconda Python
Tensorflow v2 (2.1.0) GPU version

Step 1 – Setup Nvidia Stack

It is *very important* that you install the right version of NVidia stack.
Tensorflow v2.1 works with CUDA 10.1 (and 10.2) as of this writing

Here is Tensorflow GPU official guide
I am copying the code here for completeness.

# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-430
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.4.38-1+cuda10.1  \
    libcudnn7-dev=7.6.4.38-1+cuda10.1


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
    libnvinfer-dev=6.0.1-1+cuda10.1 \
    libnvinfer-plugin6=6.0.1-1+cuda10.1

Note: CUDA might be updated to 10.2, when you update the Ubuntu system (sudo apt-get update; sudo apt-get upgrade).
This is OK, Tensorflow works with CUDA 10.2 as well.

Freezing CUDA (Important!)

It is very important to maintain CUDA libraries at the supported version. So we don’t want CUDA updated when we update the ubuntu system (by using sudo apt update && sudo apt upgrade)

So here is how to freeze cuda versions, so they don’t automatically get upgraded

$   sudo apt-mark hold cuda-10.1
$   sudo apt-mark hold cuda

$   sudo apt-mark showhold

Also add this to $HOME/.bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

Verify NVidia Stack

After doing the above, be sure to reboot your machine

Run the following commands (thanks to this page)

‘nvida-smi‘ command to verify your GPU is accessible
nvcc : cuda compiler version
deviceQuery

$  nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0  On |                  N/A |
| 47%   40C    P8    20W / 175W |    509MiB /  7981MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

# ---------
$  nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

# -------
$ cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  440.82  Wed Apr  1 20:04:33 UTC 2020
GCC version:  gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 

# ------
$ /usr/local/cuda/extras/demo_suite/deviceQuery 

/usr/local/cuda/extras/demo_suite/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce RTX 2070"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    7.5
...
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1, Device0 = GeForce RTX 2070
Result = PASS

Step 2 – Anaconda

Install Anaconda python using their install guide

Step 3 – Create an Anaconda environment for Tensorflow GPU

# TF2 works with python 3.7
# The name of the environment is tf-gpu
$ conda create --name tf-gpu python=3.7

# activate this env and install all needed packages
$ conda activate tf-gpu

# the prompt will change to (tf-gpu) as below
(tf-gpu) user@host:~ >

Step 4 – Install Anaconda packages

**Note: Open a new terminal before executing the following commands**

# be sure be in your newly created env
$ conda activate tf-gpu

# Install basic ML toolkits
$ conda install -y numpy pandas matplotlib seaborn scikit-learn scipy jupyterlab

# now the main package
conda install -y tensorflow-gpu

Step 5 – Test (1) : Simple Test for TF2-GPU

This is just a quick test (hello world) to see if TF-GPU is working

# make sure you are in the right environment
$ conda activate tf-gpu

# start python
$ python

# type the following commands
>> import tensorflow as tf
>> tf.__version__
2.1.0

>> tf.config.experimental.list_physical_devices('GPU')
# ...log output snipped...
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Step 6 – Test (2): Full blown test With Neural Network

Let’s do a full blown test now.
In this test, we are going to train a neural network.
We will build a simple LeNet Convolutional Network to identify handwritten MNIST digits.
Here is the python script.

Download the script (cnn-mnist-1-train-gpu-minimal.py) and run it as follows

# make sure you are in the right environment
$ conda activate tf-gpu

$  wget https://raw.githubusercontent.com/elephantscale/es-public/master/tf-gpu/cnn-mnist-1-train-gpu-minimal.py

$ python cnn-mnist-1-train-gpu-minimal.py

During the run, please pay attention to the training phase, the output will look similar to the following.

~~~~~ training starting ...
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
48000/48000 [==============================] - 7s 152us/sample - loss: 0.1667 - accuracy: 0.9490 - val_loss: 0.0715 - val_accuracy: 0.9789
Epoch 2/10
48000/48000 [==============================] - 4s 91us/sample - loss: 0.0482 - accuracy: 0.9843 - val_loss: 0.0514 - val_accuracy: 0.9842
...
Epoch 10/10
48000/48000 [==============================] - 4s 76us/sample - loss: 0.0079 - accuracy: 0.9975 - val_loss: 0.0380 - val_accuracy: 0.9910
~~~~~ trained on 60,000 images in 42,436.38 ms

You can see each epoch is taking around 4 secs!
This used to take 10 times longer (40-50 secs per epoch) with plain CPU.
So hooray!

Step 7: Solving the dreaded “Failed to get convolution algorithm”

A lot of times, when you run tensorflow-GPU algorithm you will get some errors like below
“Failed to get convolution algorithm”
” cuDNN failed to initialize”

The root cause of most of these errors is TF is running out of memory.
By default Tensorflow tries to allocate all of the memory in the GPU.
When you have other applications running (like Ubuntu desktop apps like Window Manager, Terminal ..etc) they share the GPU memory.
So TF fails when it tries to allocate an entire GPUmemory.

The fix is to tell TF to allocate memory dynamically and grow it as needed.
This is not the way to do the fastest compute, but it will work in a shared environment.

Here is the magical code; put this right below

## ---- start Memory setting ----

import tensorflow as tf

from tensorflow.compat.v1.keras.backend import set_session
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU
config.log_device_placement = True # to log device placement (on which device the operation ran)
sess = tf.compat.v1.Session(config=config)
set_session(sess)
## ---- end Memory setting ----

Step 8: Running Tensorflow GPU in Jupyter Notebook

This is the final step.
Once I setup TF-GPU, I ran the code in Jupyter.
But the code was NOT using the GPU.
It turned out, that the Jupyter notebook was using a CPU kernel.

Here is how to create a Jupyter Python kernel with GPU

# make sure you are in the right environment
$ conda activate tf-gpu

# create a new kernel with GPU support
$ python -m ipykernel install --user --name tf-gpu --display-name "TensorFlow-GPU"

# start jupyter (here I am running the labs edition)
$ jupyter lab

# open a notebook, and select the newly created kernel 'tf-gpu'

Here is the Jupyter notebook for MNIST detection that is optimzied for Tensorflow2-GPU

That is is it!
Enjoy your GPU.

Tensorflow2-GPU (Nvidia) + Ubuntu 18.04 + Anaconda + Jupyter – Setup guide