Hacking Hadoop – 2

A while back, we described hacking Hadoop through the Cloudera Manager (CM) or through Ambari. But there is so much more to hack! Here is what I would do if I had a chance (this is just the first approximation of the list, comments are welcome). Hacking through CM or Ambari Try default passwords admin/admin Try […]

This entry was posted in Hacking.

A Unique Proposal

Today, many companies want to create their own custom training content, in their own format. We at Elephant Scale are experts at this. We created all of our own content – which we regularly use for training, and receive very good feedback – and we can help you create yours! For many demanding clients, we […]

How to prepare for the Cloudera Data Scientist Certification Exam

At our Houston Hadoop Meetup, Austin Sun showed how to prepare for the Cloudera Data Scientist Certification exam. Austin has prepared for this presentation for quite a while, passed the certification himself, and now shared his experience with others. The certification is definitely recommended by Sujee Maniyam in his “Launching Your Career in Big Data” […]

IBM Strategy for Spark

Last month, Garrett Young of IBM presented at our Houston Hadoop & Spark Meetup. The topic was an interesting one: how is IBM planning to make money on an open source project, in that case, Spark. First, Garrett briefly introduced Spark and spelled out the reasons for IBM’s interest in Spark: it is performant, productive, […]

Processing unstructured text data with Spark 2 APIs – Dataset & Dataframe

This is part of our migrating/updating to Spark 2 series. See all our posts on Spark and Spark2. This post explains how to process unstructured, text data using newer Spark 2 APIs Code repository Learning Spark @ Github And here is the code on Github Screencast Sample data Nursery rhyme : twinkle twinkle little star. […]