Distributed Deep Learning is not easy. While both Tensorflow and Caffe have distributed modes, they tend to be used for a single node super-box with lots of GPUs. That’s fine if the goal is to speed up training on a reasonable size amount of data, but what if the problem is that we have a “big data” training scenario? Tensorflow and Caffe do allow for multi-node scenarios, but the infrastructure is challenging. New distributed deep learning frameworks like Horovod are intriguing, but unproven. Doing it “at scale” isn’t done much in real life.
If we’re dealing with a Big Data scenario, then we’re likely having Hadoop and Spark as part of the picture. Somehow we have to get the data to our training cluster. If that’s done over the network, then that won’t really scale.
So what can we do? How about we just do our training of our model on our Hadoop cluster? Is that possible?
Welcome to BigDL.
Up until now, Machine Learning on Big Data had one critical problem: there was no Deep Learning component as part of the Big Data stack. BigDL is that Deep Learning component that we’ve all been looking for.
If you’ve used libraries like Keras, Tensorflow, or Caffe, using BigDL is fairly straightforward. Its API reminds me a lot of Keras. You can even serialize your weights files from Tensorflow or Caffe and include them in BigDL, and vice versa. And yes, tensorboard does work with BigDL.
What about GPUs? Unfortunately, BigDL doesn’t support GPU-based acceleration. While that may seem like a fatal problem, modern CPUs are actually a lot better at handling DL workloads than what they used to be. And using existing Big Data infrastructure is an critical advantage. BigDL won’t be the answer for every workload, to be sure, but the fact that most companies have large Hadoop clusters with data just sitting around, waiting to be used, means that there’s a golden opportunity to use BigDL on such problems.
How to get started with BigDL? Glad you asked. I’ve prepared a series of video tutorials on BigDL to help you get going.
BigDL Tutorial Video 1: Introduction to BigDL
BigDL Tutorial Video 2: Creating A Basic Neural Network
BigDL Tutorial Video 3: Transfer Learning for Image Classification With BigDL
BigDL Tutorial Video 4: Long Short Term Memory (LSTM) and Recurrent Neural Network (RNN) on BigDL