When my friend and co-founder Sujee Maniyam presented his “Launching Your Career in Big Data” at SNIA, it became an immediate hit. I mention it every time I teach any of my Big Data courses. So I thought it made sense to update it with the latest data and trends and put it in the form of white paper. Sujee has been working in software development for over fifteen years; he is a consultant and trainer in Big Data, and he co-authored Hadoop Illuminated and HBase Design Patterns.
Who is the intended audience here? I write for developers, data analysts, project leads and IT managers. Now let us start with an important question:
How real is Big Data?
Is Big Data just a bubble, or does it have the substance that will allow it to continue growing for the following ten years? Here is the diagram of Big Data adoption phases.
As you can see, the benefits of Big Data are real. In fact, Big Data tools and methods will become part of the toolbox for many developers and business analysts. These skills are already quite in demand, as illustrated by the Indeed job trends below.
You can do your own experiment with the Indeed job trends here.
So how do you start?
Steps to start your Big Data career
It differs for everyone, but you can group them into four:
Step 1. Learn.
The basic knowledge that you will need is Hadoop, Spark, and NoSQL. There are multiple options for you to start: books, recorded courses, or instructor-led training. These three will give you a foundation. Look at our open-source “Hadoop Illuminated” book for starters. After that, you will be able to tackle NoSQL and Spark.
Briefly, Hadoop is for batch processing of large amounts of data, NoSQL is for fast and scalable databases, and Spark is a new way to do Hadoop processing in real-time and interactively.
Whether you are a developer primarily writing code, or a data scientist getting data insights, you will need these tools. The way you use them is different, and this will come with training and practice.
You can download virtual machines from Cloudera or Hortonworks, and go through the tutorials that come with them. We at Elephant Scale maintain many open-source Hadoop labs on GitHub. This includes 50+ labs on HDFS / MapReduce / Pig / Hive.
Where can you can data for your practice? We keep a list of public data sets here. Many companies, such as Amazon and Google, host big data sets.
You need to go beyond “Hello World” examples. To do this, set up real clusters. For example, if you use Amazon, you can get spot instances at 5-10 cents hourly rate. With a 5-10 node cluster, you will be spending a buck every hour, but the practical experience you get will be worth much more.
Should you get a certification? They are useful if you have no practical experience or want to be a consultant. Cloudera and Hortonworks offer recognized certifications, and you don’t necessarily need to take their courses. Prepare well, do practice tests, especially since each test or re-test costs a few hundred dollars.
Now you are ready for your first Big Data job. Before that, however, it is good to
First, get your business card. Then go to Meetup.com, and find your local Big Data groups. Conferences, such as Strata are expensive, but HBaseConf or Hadoop Summit are only a few hundred dollars. Alternatively, beg someone for a visitor pass. In the end, it is money and effort well-spent.
Read the book “Never eat alone” by Keith Ferrazzi. Become the connector, make introductions, and people will remember you. Run a meetup yourself.
Be ready to become popular with your newly acquired skills
It is not what you know; it is who knows you! Open source can be a huge boost to your resume, so get involved in a project. Write quality blogs and articles. Lots of magazines want contributors. Write a book, speak at meetups and conferences.
In the today’s world, they may still ask you for a formatted resume later on, but to check whether you are real or not – they will use LinkedIn, GitHub and StackOverflow.
Read the following poem by Omar Khayyam and be inspired by it
You know, my Friends, with what a brave Carouse
I made a Second Marriage in my house;
Divorced old barren Reason from my Bed
And took the Daughter of the Vine to Spouse.
Apply this poem to your resume.
By the way, it would be good to actually contribute to your GitHub. It shows your activity, and employers may check it.
Here is a possible interview scenario (just kidding):
So, have you used Hadoop at your work? What kind of practical experience do you have?
(If none, then usually the interview ends here).
Ahem, I haven’t had a chance to use Hadoop at work…
But let me tell you about the open source project I am working on.
* walk to whiteboard, start drawing, explain …*
* get hired! *
In truth, there is more detailed advice here as well, such as which book(s) to read and how to prepare, but there is only so much information one can write down. A good supplement to this article is a webinar by Sujee Maniyam.