Tag Archives: spark2

Processing unstructured text data with Spark 2 APIs – Dataset & Dataframe

This is part of our migrating/updating to Spark 2 series. See all our posts on Spark and Spark2. This post explains how to process unstructured, text data using newer Spark 2 APIs Code repository Learning Spark @ Github And here is the code on Github Screencast Sample data Nursery rhyme : twinkle twinkle little star. […]

Migrating / Upgrading to Spark version 2

Motivation Spark is an amazing computing framework.  Spark version 2 has lots of exciting stuff.  And Hadoop vendors Cloudera and Hortonworks are now supporting Spark 2 on their platforms. So we anticipate lot of people would be upgrading or migrating to Spark 2. However, lots of Spark tutorials and code samples on the web are […]

From Spark MLLib 1.0 to Spark ML 2.1

This is part of our migrating/updating to Spark 2 series. See all our posts on Spark and Spark2. Code repository Learning Spark @ Github Screencast   Spark’s Machine Learning (ML) components have changed significantly.  Just like the rest of Spark, the older RDD-based API persists with the newer dataframe based API. Yet, I find that the […]