Wikibon says Hortonworks Dataflow is stream processor with a twist
Hortonworks Inc.’s DataFlow, which the company brought to market thanks to its purchase of Onyara Inc., is much more than just another stream processor. It has a unique set of capabilities that makes it hard to classify and that are the answer to the needs in the Internet-of-Things (IoT) and Internet-of-Anything (IoAT) domains, writes Wikibon Big Data Analyst George Gilbert. But Hortonworks’ obvious intent to combine DataFlow with its Hadoop distribution signals the beginning of fragmentation of the Hadoop environment. Hadoop is entering an era similar to that of the fragmented Unix environment of the 1990s.
DataFlow does the job of a stream processor but, unlike most stream processors, is bi-directional, having a separate channel to send and receive commands that control devices and applications. It’s designed to extend beyond the data center to the edge of complex networks, and it has the resilience, lineage and security capabilities of traditional databases.
These extra qualities make it ideal for IoT, which are decentralized environments. IoT will use intelligent end-point devices to gather large quantities of data. It will often use remote computing devices to capture, analyze and store data close to the point of generation rather than trying to send huge volumes through the network to a central data center. And those remote devices also need to be controlled from a central location. A smart electrical grid, for instance, not only needs to monitor the power usage of all appliances in every home, but also to adjust temperature settings when it knows the house is empty.Having two channels makes this task much simpler to accomplish.
However, the Onyara purchase is also the latest symptom of a gradual splintering of the Hadoop environment, Gilbert writes. Until recently, Hadoop vendors all provided the same open-source core capabilities and differentiated on manageability. Cloudera Inc.’s Manager and Navigator, for example, did not change the core compute engines such as MapReduce, Hive and Pig. Cloudera ships its own analytic MPP SQL database, Impala, but this uses the standard Parquet data format and Hive HCatalog, so data is not locked in.
The fast growth of the Hadoop market, however, is beginning to splinter the community, Gilbert writes. Hortonworks has always been strongly committed to using Apache projects for core compute engines and management tools. With DataFlow, stream processing, which, Gilbert says, is becoming a core compute engine, may be different across different vendors.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU