The first part takes you from zero to using Spark on a standalone computer. The second part goes through the tools that come with Spark: SQL, real-time Streaming, Machine Learning, and Graph operations and optimization. The third part is about running real clusters, and the fourth gives an overall example of a real-time implementation, with all components and a dashboard. The book ends with the discussion of machine learning using H2O with Spark.
“In action” really means it. If you have gone through an introductory book, like “Learning Spark” by Holden Karau, then “Spark in Action” is the next step. It will make you a practical developer proficient in the real-world uses of Spark.
It is completely updated with Spark 2. The book is really out of the future: I am writing this review in December 2016, but the printed book in my hands has 2017 as the date of printing. That is more of a joke, but the Spark versions are not a minor thing: Spark is going at a very fast pace, and each dot version introduces new features. Version 2 even changes the Machine Learning component; it was MLLib, and now it ML, based on DataFrames, a completely new object type. All APIs change, so this book is in tune with the times.
I’ve seen reviews complaining about “book’s explanation,” but I did not find any such problem. Books should be evaluated, I believe, on the merit of what they do have, not on what they don’t have although the reviewer would like it to be there. This is a very practical book; it gives usage advice. If you want the internal architecture, you should go to Holden’s book, or read the code (it’s open source). But if you want to get the equivalent of practical experience, then this book is for you.
Happy traveling! (逍遙遊!)
PS. Once you buy the book in paper, you can get all electronic editions for free. There is an insert in the book that gives you the instructions.