Debunking the Big Data Talent Shortage: Make Data Analytics Less Complex

Debunking the Big Data Talent Shortage: Make Data Analytics Less Complex

This is a guest post written by Afshin Goodarzi, VP of Analytics, 1010data

There is a popular view taking hold in the market that warrants a second look, namely, the predicted shortage of data analysts. Rather than focusing on talent supply issues, the industry would be better served by addressing how to remove the needless complexity that currently exists in traditional approaches to collecting, processing and analyzing data.

Data analyst shortage predictions picked up traction with McKinsey & Company’s Business Technology Office report in 2011 that projected a 50 – 60 percent shortfall in the US by 2018. “There will be a shortage of talent necessary for organizations to take advantage of big data,” according to the report, which went on to suggest that companies must look for data analysis and statistical experts who have a firm handle on business processes.

Late last year, that same sentiment was echoed by Gartner. During its recent Gartner Symposium/ITxpo, the analyst firm noted that by 2015, only one-third of some 4.4 million global big data jobs will be filled. Part of that challenge, Gartner stated, was that there needs to be an emphasis on finding individuals with the data management, analysis and business expertise necessary for “extracting the value of big data.”

While it’s true that employers must look hard to find data analysis skills, concerns about a talent shortage are, in part, a red herring.

Resource application


The problem is not necessarily the number of analysts that are currently out there, rather, it is the way in which these analysts are being used. Businesses today need a total paradigm shift in how they handle data – and what’s expected of analysts in doing their jobs.

Let’s look at what typically happens when a company deals with a new business problem or question requiring analysis:

  1. The analyst has to define the problem.

  2. The analyst has to identify the appropriate sample of data.

  3. The IT staff must be engaged to pull the data – typically, a physical ETL (extract, transform and load) process.

  4. Someone must move the data to an analysis environment.

  5. Someone must confirm that the cached data sample is accurate and useful. If not, the IT department must be involved again.

RELATED:  Big Data leader Hortonworks' shares plunge 25 percent on Q2 sales miss

It often falls to the analyst – in many cases working with a programmer – to properly format and manage the data, ensure proper storage, and countless other technical details required for the access and maintenance of the raw data.

This process is often repeated several times, until the analyst is convinced that the sample data is correct. Only after all of that has been completed can the actual work of answering the analytical question be undertaken. What’s more, every time the data is moved from platform to platform, the data representation (schema) is inevitably changed, which can require a considerable waste of processing time.

While all this work – none of which is actually data analysis, by the way – is in progress, there are a number of project managers, program coordinators, system administrators and database administrators that have to be routinely engaged and updated. Once the analytical portion of the task is finally completed, and the business question has been answered, the next step is to “operationalize” that answer. That introduces yet another IT related responsibility for the implementation of the analysis.

Shifting the paradigm


Clearly then, traditional approaches to data analysis in business today are unnecessarily complicated and costly, and this speaks directly to the issue of a perceived data analyst shortage. After all, the 50 percent talent crisis McKinsey predicted could be reduced dramatically by making the people who deal with data more productive. If analysts are more productive, fewer are needed, and the talent shortage is reduced or eliminated.

It’s this simple: when an analyst says, “I want to analyze data,” the starting point really should be nothing more than logging into a system. They shouldn’t need a programmer to analyze the data. They need a single platform to be able to not only analyze the data, but also be able to gain business advantage from the insights gained through the analysis.

RELATED:  Watch LIVE: Can machine learning help remove Hadoop complexities? | #BigDataNYC

The industry is only now starting to understand this problem. Some smart vendors have begun offering ways to access all data in one place, so that analysts can pose and answer business questions from the same set of data on the same set of machines that organizations use for all of their operational needs. The most innovative of these systems not only make all of the data available on a single system, but also allow analysts to do complex data joins (as in data mashups) on the fly, with zero programming required. They provide analysts with interactive tools to explore trillions of rows of data and perform complicated calculations – all with sub-second response times.

That’s where business has to go if it truly wants to address any possible impending analytical talent crisis. By bringing data analysis and implementation capabilities into one easy-to-access system, you eliminate the need for countless layers of process and administration, and replace the business paradigm that gave rise to the perceived talent shortage.

Until we’ve shaken up traditional politics and organizational behavior with respect to how data is managed, accessed and ultimately used in a business context, we can’t legitimately claim that there really is a talent shortage.

About the Author

A veteran of analytics, Goodarzi has lead several teams in designing, building and delivering predictive analytics and business analytical products to a diverse set of industries. Prior to joining 1010data, Goodarzi was the Managing Director of Mortgage at Equifax, responsible for the creation of new data products and supporting analytics to the financial industry. Previously, he lead the development of various classes of predictive models aimed atthe mortgage industry during his tenure at Loan Performance (Core Logic). Earlier on he had worked at BlackRock, the research center for NYNEX (present day Verizon) and Norkom Technologies. Goodarzi’s publications span the fields of data mining , data visualization, optimization , and artificial intelligence.


Join our mailing list to receive the latest news and updates from our team.


Join our mailing list to receive the latest news and updates from our team.


  1. GilPress Agreed. #Sclera makes complex tech (#Mahout, #graph/log #analytics) accessible via seamless #SQL extensions

  2. 1010data SiliconANGLE Agreed. #Sclera makes complex tech (#Mahout, log #analytics) accessible via familiar #SQL

  3. I am afraid this just reads like one big plug for your company! Surely the shortage will be of people with data skills combined with understanding – by definition you cannot commoditise understanding. It is the old Nicholas Carr argument about commoditising IT. There is a paralell with voice systems which are a ubiquitous commodity but there is still (and always will be) a shortage of people to say the right things when using the phone – for that you need expertise.  You can commoditise the storage, access to and security of data – even the statistical algorithms to analyse it. You can’t do the same with the people who have real understanding of and insight into what it all means.

  4. In principle, I agree.  Data mining needs to become more focused and efficient.  But as it does, such queries become OLAP and no longer data mining.  OLAP presumes that you’ve structured your databases/indexes to allow “sub-second response times” to queries.  Any query that can traverse terabytes of unstructured data in a second is not ad-hoc.  That’s not machine learning.  It’s highly structured, highly optimized analytics. 
    Will much of ad hoc data mining eventually evolve into post hoc canned queries?  Sure.  Will this reduce the need for more data miners?  Doubtful.  The value in mining data is the discovery of new views on how/why your market is changing or your business is doing.  Because these insights are by definition novel, subtle, and dynamic, (and often complex) their development can never be optimized.  At best, perhaps the business of “Big Data” can reduce the fraction of dead end or fruitless inquiries.  But that’s easier said than done.

  5. But the myth about BIg Data is that – for most analytics projects – a statistics, math or engineering major is required.

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Share This

Share This

Share this post with your friends!