In my last installment I described how Microsoft missed its chance to be the leader in the Big Data. Why? Why was Dryad killed, but Spark, Dryad’s successor, is all the rage in the community? I suggested a simple enough reason: Dryad was to be placed in Microsoft’s proprietary cloud, Azure, whereas Spark is completely open source.
However, there is yet another reason, just as important as the first one. You see, Dryad was an all-out Hadoop killer. It did not play nicely with Hadoop, in fact, it did not play with it at all. (Parenthetically, could Microsoft of 2008 allow to play into the competitors hand? – no, of course not!). However, Spark pretends to be Hadoop’s friend, it uses Hadoop’s storage, HDFS, as one of the input formats. So the two play together nicely, unless something else will happen. Will it or not? Tune in to the next edition of “Sparklets,” a personal story of learning Spark.