This guide is verified as of 2020 May, with Tensorflow version 2.1.0
Background
I recently got GPU version of Tensorflow working on my ubuntu machine.
It took a lot of effort, a lot of Googling and a lot of experimenting.
Here is the final setup to help out anyone who is looking to do the same.
Updated : 2020-May-20
Setup
Hardware : Nvidia RTX 2070 8GB (see available products on Amazon)
Software Stack:
- Ubuntu 18.04
- Nvidia drivers + CUDA
- Anaconda Python
- Tensorflow v2 (2.1.0) GPU version
Step 1 – Setup Nvidia Stack
It is *very important* that you install the right version of NVidia stack.
Tensorflow v2.1 works with CUDA 10.1 (and 10.2) as of this writing
Here is Tensorflow GPU official guide
I am copying the code here for completeness.
# Add NVIDIA package repositories wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb sudo apt-get update wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb sudo apt-get update # Install NVIDIA driver sudo apt-get install --no-install-recommends nvidia-driver-430 # Reboot. Check that GPUs are visible using the command: nvidia-smi # Install development and runtime libraries (~4GB) sudo apt-get install --no-install-recommends \ cuda-10-1 \ libcudnn7=7.6.4.38-1+cuda10.1 \ libcudnn7-dev=7.6.4.38-1+cuda10.1 # Install TensorRT. Requires that libcudnn7 is installed above. sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \ libnvinfer-dev=6.0.1-1+cuda10.1 \ libnvinfer-plugin6=6.0.1-1+cuda10.1
Note: CUDA might be updated to 10.2, when you update the Ubuntu system (sudo apt-get update; sudo apt-get upgrade).
This is OK, Tensorflow works with CUDA 10.2 as well.
Freezing CUDA (Important!)
It is very important to maintain CUDA libraries at the supported version. So we don’t want CUDA updated when we update the ubuntu system (by using sudo apt update && sudo apt upgrade)
So here is how to freeze cuda versions, so they don’t automatically get upgraded
$ sudo apt-mark hold cuda-10.1 $ sudo apt-mark hold cuda $ sudo apt-mark showhold
Also add this to $HOME/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
Verify NVidia Stack
After doing the above, be sure to reboot your machine
Run the following commands (thanks to this page)
- ‘nvida-smi‘ command to verify your GPU is accessible
- nvcc : cuda compiler version
- deviceQuery
$ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2070 Off | 00000000:01:00.0 On | N/A | | 47% 40C P8 20W / 175W | 509MiB / 7981MiB | 2% Default | +-------------------------------+----------------------+----------------------+ # --------- $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 # ------- $ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.82 Wed Apr 1 20:04:33 UTC 2020 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) # ------ $ /usr/local/cuda/extras/demo_suite/deviceQuery /usr/local/cuda/extras/demo_suite/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce RTX 2070" CUDA Driver Version / Runtime Version 10.2 / 10.2 CUDA Capability Major/Minor version number: 7.5 ... deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1, Device0 = GeForce RTX 2070 Result = PASS
Step 2 – Anaconda
Install Anaconda python using their install guide
Step 3 – Create an Anaconda environment for Tensorflow GPU
# TF2 works with python 3.7 # The name of the environment is tf-gpu $ conda create --name tf-gpu python=3.7 # activate this env and install all needed packages $ conda activate tf-gpu # the prompt will change to (tf-gpu) as below (tf-gpu) user@host:~ >
Step 4 – Install Anaconda packages
**Note: Open a new terminal before executing the following commands**
# be sure be in your newly created env $ conda activate tf-gpu # Install basic ML toolkits $ conda install -y numpy pandas matplotlib seaborn scikit-learn scipy jupyterlab # now the main package conda install -y tensorflow-gpu
Step 5 – Test (1) : Simple Test for TF2-GPU
This is just a quick test (hello world) to see if TF-GPU is working
# make sure you are in the right environment $ conda activate tf-gpu # start python $ python
# type the following commands >> import tensorflow as tf >> tf.__version__ 2.1.0 >> tf.config.experimental.list_physical_devices('GPU') # ...log output snipped... [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Step 6 – Test (2): Full blown test With Neural Network
Let’s do a full blown test now.
In this test, we are going to train a neural network.
We will build a simple LeNet Convolutional Network to identify handwritten MNIST digits.
Here is the python script.
Download the script (cnn-mnist-1-train-gpu-minimal.py) and run it as follows
# make sure you are in the right environment $ conda activate tf-gpu $ wget https://raw.githubusercontent.com/elephantscale/es-public/master/tf-gpu/cnn-mnist-1-train-gpu-minimal.py $ python cnn-mnist-1-train-gpu-minimal.py
During the run, please pay attention to the training phase, the output will look similar to the following.
~~~~~ training starting ... Train on 48000 samples, validate on 12000 samples Epoch 1/10 48000/48000 [==============================] - 7s 152us/sample - loss: 0.1667 - accuracy: 0.9490 - val_loss: 0.0715 - val_accuracy: 0.9789 Epoch 2/10 48000/48000 [==============================] - 4s 91us/sample - loss: 0.0482 - accuracy: 0.9843 - val_loss: 0.0514 - val_accuracy: 0.9842 ... Epoch 10/10 48000/48000 [==============================] - 4s 76us/sample - loss: 0.0079 - accuracy: 0.9975 - val_loss: 0.0380 - val_accuracy: 0.9910 ~~~~~ trained on 60,000 images in 42,436.38 ms
You can see each epoch is taking around 4 secs!
This used to take 10 times longer (40-50 secs per epoch) with plain CPU.
So hooray!
Step 7: Solving the dreaded “Failed to get convolution algorithm”
A lot of times, when you run tensorflow-GPU algorithm you will get some errors like below
“Failed to get convolution algorithm”
” cuDNN failed to initialize”
The root cause of most of these errors is TF is running out of memory.
By default Tensorflow tries to allocate all of the memory in the GPU.
When you have other applications running (like Ubuntu desktop apps like Window Manager, Terminal ..etc) they share the GPU memory.
So TF fails when it tries to allocate an entire GPUmemory.
The fix is to tell TF to allocate memory dynamically and grow it as needed.
This is not the way to do the fastest compute, but it will work in a shared environment.
Here is the magical code; put this right below
## ---- start Memory setting ---- import tensorflow as tf from tensorflow.compat.v1.keras.backend import set_session config = tf.compat.v1.ConfigProto() config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU config.log_device_placement = True # to log device placement (on which device the operation ran) sess = tf.compat.v1.Session(config=config) set_session(sess) ## ---- end Memory setting ----
Step 8: Running Tensorflow GPU in Jupyter Notebook
This is the final step.
Once I setup TF-GPU, I ran the code in Jupyter.
But the code was NOT using the GPU.
It turned out, that the Jupyter notebook was using a CPU kernel.
Here is how to create a Jupyter Python kernel with GPU
# make sure you are in the right environment $ conda activate tf-gpu # create a new kernel with GPU support $ python -m ipykernel install --user --name tf-gpu --display-name "TensorFlow-GPU" # start jupyter (here I am running the labs edition) $ jupyter lab # open a notebook, and select the newly created kernel 'tf-gpu'
Here is the Jupyter notebook for MNIST detection that is optimzied for Tensorflow2-GPU
That is is it!
Enjoy your GPU.