How Should the CIO Develop a Big Data System?

The most technologically advanced companies in the world have only just begun to implement big data strategies and most lack a structured approach.

So what do CIOs need to do in order to make smart decisions about implementing big data solutions?

Aditya Yadav & Associates (AY&A) is a global consulting company. In November it will publish a report on big data systems. Its focus looks pretty sound. We’ll have to make a final judgement when the report is published but it seems like it will cover the right topics. The report covers big data strategies for customers and companies that are exploring how to become big data service providers. Refreshingly, it looks to Internet companies as the models for what to do. This makes perfect sense. These companies are the pioneers in the field, developing the first use cases for  how Hadoop is applied. The report does not include any vendor analysis, which it will save for future reports.

Let’s run through an overview for if nothing else, it’s a basic guide for how consultants view the process for developing a big data strategy.

Start with Hadoop. Develop a working definition of Hadoop based upon your research.

Understand when and when not to use Hadoop. Hadoop is a distributed file system. It is not a database. For example, see Iwona Bialynicka-Birula post about what not to do when working with Hadoop to harvest its full potential.

Build an Economic Model. What is the cost to run the data across hundreds or thousands of nodes? What is the compute cost?

Use Internet Companies as a Model. Internet companies were the first to use Hadoop and have since built their own big data systems. Their architecture is a guide for building a big data environment.

Look at the Hadoop Stack and how its Future Looks. Cloudera’s Hadoop stack will give you a decent guide for what you will be looking at when developing your own infrastructure. Here’s a presentation by Doug Cutting, one of the original creators for Apache Hadoop and Cloudera’s Chief Data Scientist Jeff Hammerbacher.

Look at the Various Alternative Flavors of Hadoop. Microsoft, for example is exploring its own variation on MapReduce. Lexis-Nexis is offering a Hadoop alternative.

Compare the Top Consultants in the Hadoop Space. Cloudera and Hortnworks provide services as do major consulting companies such as Accenture and Deloitte.

Develop a Working Definition of NoSQL. NoSQL is gaining acceptance in the market. Here is one definition.

Understand the Limitations of Relational Databases. The golden era of relational database management systems is over.

From Government Technology:

Former Federal CIO Vivek Kundra recently said, “this notion of thinking about data in a structured, relational database is dead, some of the most valuable information is going to live in video, blogs, and audio, and it is going to be unstructured inherently.” Modern, 21st century tools have evolved to tackle unstructured information, yet a huge majority of federal organizations continue to try and use relational databases to solve modern information challenges. The question is this – what is keeping us from realizing the full potential of our data, structured and unstructured, which is (among other things) a vital necessity to national security?

Services Angle

The big data landscape is alien to almost everyone. The best guides come from the major Internet companies. They provide a model for how to manage big data. Developing working definitions and looking at the various implementations is the way forward for any organization considering developing a big data strategy for themselves or their customers.

SiliconAngle will do live streaming coverage next week from HadoopWorld. Join us live on The Cube as we stream interviews with industry leaders and cover one of the year’s most important big data events.