How do you hack Hadoop? Here is what we did. We took our team hacker (whom we will call Mr. Hacker, because he prefers it this way), and we put him through our Hadoop Admin course. Then we constructed a Cloudera cluster and let him loose on it.
At first, Mr. Hacker was going steady at about two vulnerabilities a day. His reports looked something like this
=================================================================
1. XSS - In the Pig Query Editor, it is possible to name a pig script with a
cross site scripting injection that will fire when the script is saved.
To Reproduce:
1. Log in to Hue Web UI.
2. At top, under Query Editors, select Pig.
3. The default location is a new unsaved script. At left, select Properties.
4. Name the script the following: "><img src=x onerror=alert('XSS')>
5. At left, click Save. The injection will fire twice there.
The injection will also fire when the script is run.
==================================================================
Example exploit screens are attached. We got in touch with the Hue and Hive committers, and in addition to fun, we also got free designer beer from http://craftshack.com/ and a T-shirt. Open source guys are fun, and they like giving you presents when you hack their projects. Of course, they then close the vulns 🙁
Have exhausted this mine, we pointed Mr. Y to an Ambari Hadoop install. There we found less vulns, I have to admit – it seems that someone already hacked it before. Actually, we thought it was a Flume bug, but it was in Ambari. But it was bad enough – you could not easily remove the XSS from the hacked database.
We also got familiar with the more formal vuln submission process, but it worked fair enough, we got our answers.
The moral of the story?
Hadoop (and Spark, NoSQL, etc.) hacks are fun and are bound to happen. The reasons are obvious.
People are more concerned with building new and useful functionality than with protecting against those who want to break it. This is obvious.
But Hadoop has something else going for it: it is usually hosted on an isolated subnet. Then the common reasoning is that “I am protected because nobody can get inside the network on which my cluster is hosted.” But the corollary is: once they get inside, it’s all open!
Thus, the more organizations adopt Big Data technology, the more there is a chance for hackers to play around. Let’s watch how all this will play out in the next few years.
How to secure your Hadoop, Spark, Cassandra, etc.?
Here are a few rules.
- Study the “Hadoop Security” book. Implement good security practices from the get-go.
- Do not assume that “security by obscurity” works. It does not. There are armies of robots scanning for vulnerabilities all the time.
- Do not assume that once you are inside your isolated network, then “nothing can happen.” That is not true. Networks get broken into.
- Think pro-actively, imagine what a good hacker will do. I predict that “NoSQL injection attacks” are yet to come.
- Have a security assessment team in place. Do penetration testing, and task them with making sure that your security practices are adhered to.
- Engage with the community and security experts.