‘Information overload’ has become one of most often-repeated mantras of our time. Books are being digitized, newspapers and magazines now make up just a fraction of today’s media, augmented as it is by wave after wave of tweets and blog posts, and all the while the gadgets we use to keep up with this digital frenzy become increasingly complex.
Some might complain about the digital revolution – but there’s no denying the immense impact it’s had on our lives. With experts claiming that as much as 90% of all the information in existence is less than two years old, everyone and everything, from governments and marketers, to police and now even farmers, has begun to show an interest one of the hottest talking points of our time – big data.
But did you ever wonder where all this data came from? And more to the point, how did it get so big, and where is it all going? These are just some of the questions that we’ll be attempting to answer in today’s short history of big data, charting the five major milestones that led to its evolution into an entity that promised to change our world forever.
1890: The First Big Data Problem
Back in 1890 when the US government decided to perform a national census, the poor clerks at the Bureau responsible were faced with the unenviable task of counting more than 60 million souls in the country by means of laboriously transferring data from schedules to record sheets by the slow and heartbreaking method of hand tallying.
Horrified at the prospect, Herman Hollerith came to the rescue with his novel new Pantograph tabulating machine, modeled on train conductor’s habit of punching holes into tickets to denote physical features and thus prevent fraud. Hollerith’s idea was a simple punch card which held the data of Census respondents and could be read in seconds by his electrical tabulating machine. There’s little doubt that Hollerith’s invention was a defining moment in the history of data processing, one that symbolized the beginning of the mechanized data collecting age – Hollerith’s machines successfully tabulated no less than 62,622,250 people in the US, saving the Census Bureau some $5 million and cutting the Census completion time down from ten years to less than 24 months.
1965: First Data Center is Conceived
Data didn’t really become data until it had a base that it could safely reside in – a database to be exact. In 1965, faced with the growing problem of where to keep more than 742 million tax returns and 175 million sets of fingerprints, the US government decided that its data needed a smaller home, and began to study the feasibility of transferring all of those records to magnetic computer tape and storing it all on one big computer.
While the plan was later dropped amid privacy concerns, it would later be remembered as one that heralded the dawn of the electronic data storage era – nudging all of those pen-pushing office clerks into oblivion once and for all.
1989: The World Wide Web is Born
Tim Berners-Lee’s proposal to leverage the internet proved to be a game-changer in the way we share and search for information. The British computer scientist probably had little idea of the immense impact that facilitating the spread of information via ‘hypertext’ would have on the world, yet he all the same he was remarkably confident of its success:
“The information contained would grow past a critical threshold, so that the usefulness [of] the scheme would in turn encourage its increased use,” he wrote at the time.
1997-2001: Big Data is Defined
In their paper titled Application-controlled demand paging for out-of-core visualization, Michael Cox and David Ellsworth are among the first to acknowledge the problems that big data will present as information overload continues on its relentless path:
“Visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.”
Cox and Ellesworth’s use of the term “big data” is generally accepted as the first time anyone has done so, although the honor of actually defining the term must go to one Doug Laney, who in 2001 described it as being a “3-dimensional data challenge of increasing data volume, velocity and variety”, a definition that has since become almost ubiquitous among industry experts today.
2004: Enter Hadoop
Having dealt with big data problems, invented servers, developed a method of sharing data, and defined exactly what it is – all that was left was to come up with some kind of tool that could help us actually understand our big data.
Enter Hadoop, the free and open-source software program, named after a toy elephant, which has rapidly become one of the world’s most popular websites. In the last eight years, Hadoop has become so big that it controls entire search engines, determining everything from which ads they show us, to which long-lost friends Facebook pulls out the hat, and even the stories you see on your Yahoo homepage.
The creation of Hadoop marks big data’s biggest milestone yet. It’s an innovation that’s changed the face of big data forever, and with it the lives of everyone on the planet. Hadoop provides a solution that anyone can use – from players like Google and IBM, to even the smallest of internet marketers – giving everyone the chance to profit from the most enigmatic phenomena of our time.