Review of “Hadoop in Action,” second edition

lam2Manning Publications, by Chuck P. Lam, Mark W. Davis, and Ajit Gaddam

Four years have passed since the first publication, and as Russians say, “A lot of water has passed (under the bridge) since then,” so let’s look at what’s new in this edition.

First off, it has definitely been reworked, and it has added clear explanations for YARN, Impala, Tez, Avro and Solr. Why is that important? For Impala and Tez, you always have to figure out who says what about them. Avro has achieved much more prominence than before, and Solr has been integrated into Hadoop itself, to say nothing of search arguably being the “Killer app” for Big Data. Thus, you will get a balanced view on the latest developments. Nothing will catch you off guard. Unless it is projects like Nifi and Spark, but that’s another story.

I am reading an MEAP, early preview, and the book is not complete there, but I will update this review as more details become available. So treat my words as a reverse Bloom filter: if I say that something is there, then it is definitely there, but if it is not mentioned, then it might still appear. In any case, I am talking to the readers with unquenchable thirst for the latest and greatest.

In the installation part, the book only discusses Hadoop 2 / YARN, not Hadoop 1. But that it treats on a good level detail, with scripts and configuration. When talking to HDFS programmatically, it gives the latest versions of the Hadoop and Java classes and – just as before – explains things quite clearly.

Best practices section is nice, and it addresses such areas as JVM reuse, counters, skipping records outside of Java, logging, and performance tuning.

The last currently available chapter is on security, Kerberos, ALC and LDAP, with a reasonable level of detail, Apache Knox, and mention of Apache Sentry.

All in all, I think it is about the right level of detail for a developer: not too little, so that it is still useful, but not too much, so that it is not boring.

Of course, the most interesting chapters are still in the promised state, in particular, Hadoop for Data Scientist section.  So, come back and check laters. Or subscribe to our newsletter, it will be there.

Leave a Reply

Your email address will not be published. Required fields are marked *