Category Archives: Dev Stuff

How to prepare for the Cloudera Data Scientist Certification Exam

At our Houston Hadoop Meetup, Austin Sun showed how to prepare for the Cloudera Data Scientist Certification exam. Austin has prepared for this presentation for quite a while, passed the certification himself, and now shared his experience with others. The certification is definitely recommended by Sujee Maniyam in his “Launching Your Career in Big Data” […]

Processing unstructured text data with Spark 2 APIs – Dataset & Dataframe

This is part of our migrating/updating to Spark 2 series. See all our posts on Spark and Spark2. This post explains how to process unstructured, text data using newer Spark 2 APIs Code repository Learning Spark @ Github And here is the code on Github Screencast Sample data Nursery rhyme : twinkle twinkle little star. […]

Migrating / Upgrading to Spark version 2

Motivation Spark is an amazing computing framework.  Spark version 2 has lots of exciting stuff.  And Hadoop vendors Cloudera and Hortonworks are now supporting Spark 2 on their platforms. So we anticipate lot of people would be upgrading or migrating to Spark 2. However, lots of Spark tutorials and code samples on the web are […]

Lawyers and Machine Learning – How a Little Learning Goes a Long Way

Machine Learning and Artificial Intelligence (AI) are certainly all the rage today. AI will touch everyone, it will change lives and careers in a matter of years, and definitely in your lifetime. Andrew Ng, chief scientist at Chinese Internet search giant Baidu and co-inventor of the Google Brain, explains it succinctly in his interview: it used to […]

From Spark MLLib 1.0 to Spark ML 2.1

This is part of our migrating/updating to Spark 2 series. See all our posts on Spark and Spark2. Code repository Learning Spark @ Github Screencast   Spark’s Machine Learning (ML) components have changed significantly.  Just like the rest of Spark, the older RDD-based API persists with the newer dataframe based API. Yet, I find that the […]