‘Inernet of Things‘ or ‘Connected Things‘ (or what ever the buzz word of the day is 🙂 might be the ‘killer app’ for Big Data.
The amount of data that would need to be ingested and analyzed makes this a prime candidate for Big Data technologies.
When we think of Big Data, we usually think Hadoop. How ever Hadoop may not be the best fit for IoT. Let’s remember Hadoop was developed to analyze large amounts of data in batch mode. And it does that very well.
IoT workloads are very different. Here are some differences:
- Most IoT data comes in as ‘streams’ (continuously incoming data). Hadoop is designed to handle ‘data on disk’
- IoT events need to be analyzed in ‘real time’. When we say real time, it can be milli seconds or in seconds. Either way this kind of low latency is not achievable by Hadoop and Map Reduce
- IoT events may be queried in real time (e.g. 10 latest events from a sensor ).
We are seeing new technologies emerging to handle IoT specific workloads. Here are few:
Storm probably was the first ‘stream processing platform’. It came out of twitter and now an Apache project.
Framework built on Kafka and YARN
Streaming framework implemented by the popular Spark framework
Flink is a recent framework focused on streaming workloads.
NiFi is developed by NSA (National Security Agency) and open sourced.
Onyara – a company started by NiFi committers , is acquired by HortonWorks.
Hortonworks is releasing Data Flow product based on NiFi.
Kudu is developed by Cloudera. Kudu is a storage system that is suited for real time updates of data.