This is a guest post written by Afshin Goodarzi, VP of Analytics, 1010data
There is a popular view taking hold in the market that warrants a second look, namely, the predicted shortage of data analysts. Rather than focusing on talent supply issues, the industry would be better served by addressing how to remove the needless complexity that currently exists in traditional approaches to collecting, processing and analyzing data.
Data analyst shortage predictions picked up traction with McKinsey & Company’s Business Technology Office report in 2011 that projected a 50 – 60 percent shortfall in the US by 2018. “There will be a shortage of talent necessary for organizations to take advantage of big data,” according to the report, which went on to suggest that companies must look for data analysis and statistical experts who have a firm handle on business processes.
Late last year, that same sentiment was echoed by Gartner. During its recent Gartner Symposium/ITxpo, the analyst firm noted that by 2015, only one-third of some 4.4 million global big data jobs will be filled. Part of that challenge, Gartner stated, was that there needs to be an emphasis on finding individuals with the data management, analysis and business expertise necessary for “extracting the value of big data.”
While it’s true that employers must look hard to find data analysis skills, concerns about a talent shortage are, in part, a red herring.
The problem is not necessarily the number of analysts that are currently out there, rather, it is the way in which these analysts are being used. Businesses today need a total paradigm shift in how they handle data – and what’s expected of analysts in doing their jobs.
Let’s look at what typically happens when a company deals with a new business problem or question requiring analysis:
The analyst has to define the problem.
The analyst has to identify the appropriate sample of data.
The IT staff must be engaged to pull the data – typically, a physical ETL (extract, transform and load) process.
Someone must move the data to an analysis environment.
Someone must confirm that the cached data sample is accurate and useful. If not, the IT department must be involved again.
It often falls to the analyst – in many cases working with a programmer – to properly format and manage the data, ensure proper storage, and countless other technical details required for the access and maintenance of the raw data.
This process is often repeated several times, until the analyst is convinced that the sample data is correct. Only after all of that has been completed can the actual work of answering the analytical question be undertaken. What’s more, every time the data is moved from platform to platform, the data representation (schema) is inevitably changed, which can require a considerable waste of processing time.
While all this work – none of which is actually data analysis, by the way – is in progress, there are a number of project managers, program coordinators, system administrators and database administrators that have to be routinely engaged and updated. Once the analytical portion of the task is finally completed, and the business question has been answered, the next step is to “operationalize” that answer. That introduces yet another IT related responsibility for the implementation of the analysis.
Shifting the paradigm
Clearly then, traditional approaches to data analysis in business today are unnecessarily complicated and costly, and this speaks directly to the issue of a perceived data analyst shortage. After all, the 50 percent talent crisis McKinsey predicted could be reduced dramatically by making the people who deal with data more productive. If analysts are more productive, fewer are needed, and the talent shortage is reduced or eliminated.
It’s this simple: when an analyst says, “I want to analyze data,” the starting point really should be nothing more than logging into a system. They shouldn’t need a programmer to analyze the data. They need a single platform to be able to not only analyze the data, but also be able to gain business advantage from the insights gained through the analysis.
The industry is only now starting to understand this problem. Some smart vendors have begun offering ways to access all data in one place, so that analysts can pose and answer business questions from the same set of data on the same set of machines that organizations use for all of their operational needs. The most innovative of these systems not only make all of the data available on a single system, but also allow analysts to do complex data joins (as in data mashups) on the fly, with zero programming required. They provide analysts with interactive tools to explore trillions of rows of data and perform complicated calculations – all with sub-second response times.
That’s where business has to go if it truly wants to address any possible impending analytical talent crisis. By bringing data analysis and implementation capabilities into one easy-to-access system, you eliminate the need for countless layers of process and administration, and replace the business paradigm that gave rise to the perceived talent shortage.
Until we’ve shaken up traditional politics and organizational behavior with respect to how data is managed, accessed and ultimately used in a business context, we can’t legitimately claim that there really is a talent shortage.
About the Author
A veteran of analytics, Goodarzi has lead several teams in designing, building and delivering predictive analytics and business analytical products to a diverse set of industries. Prior to joining 1010data, Goodarzi was the Managing Director of Mortgage at Equifax, responsible for the creation of new data products and supporting analytics to the financial industry. Previously, he lead the development of various classes of predictive models aimed atthe mortgage industry during his tenure at Loan Performance (Core Logic). Earlier on he had worked at BlackRock, the research center for NYNEX (present day Verizon) and Norkom Technologies. Goodarzi’s publications span the fields of data mining , data visualization, optimization , and artificial intelligence.