Searching for Dark Data

We live in a highly connected world where every digital interaction spawns chain reactions of unfathomable data creation. The rapid explosion of text messaging, emails, video, digital recordings, smartphones, RFID tags and those ever-growing piles of paper – in what was supposed to be the paperless office – has created a veritable ocean of information.

Welcome to the world of Dark Data


Welcome to the world of Dark Data, the humongous mass of constantly accumulating information generated in the Information Age. Whereas Big Data refers to the vast collection of the bits and bytes that are being generated each nanosecond of each day, Dark Data is the enormous subset of unstructured, untagged information residing within it.

Research firm IDC estimates that the total amount of digital data, aka Big Data, will reach 2.7 zettabytes by the end of this year, a 48 percent increase from 2011. (One zettabyte is equal to one billion terabytes.) Approximately 90 percent of this data will be unstructured – or Dark.

Dark Data has thrown traditional business intelligence and reporting technologies for a loop. The software that countless executives have relied on to access information in the past simply cannot locate or make sense of the unstructured data that comprises the bulk of content today and tomorrow. These tools are struggling to tap the full potential of this new breed of data.

The good news is that there’s an emerging class of technologies that is ready to pick up where traditional tools left off and carry out the crucial task of extracting business value from this data.

Searching for Dark Data


Thanks to technologies like Hadoop, the science of handling large data sets has advanced quite rapidly. Yet Hadoop can only go so far. While it can store and secure the exploding volumes of data, the challenge isn’t simply how best to house the data; rather, it’s how companies should go about both searching the different types of integrated data – structured, semi-structured and unstructured – to discover patterns and insights and then analyzing these found data patterns in order to make better business decisions. The functionality that drives the search, discovery and analysis capabilities is rooted in a technology that has recently sprung like a Phoenix from the proverbial ashes, reinvigorated by the advent of Big Data and cloud computing: Enterprise Search.

The Dark Data is essentially an unedited record that is less subject to bias or inaccuracy than consciously connected data. It really is the raw record of a business’ history. Without Enterprise Search, businesses can only scratch the surface of the knowledge hiding within the data. Adding Enterprise Search to the mix, however, brings the semi- and unstructured data to life. For instance, consider the following business cases where search is used to mitigate risks and impact bottom lines.

An insurance company prices policies through sophisticated algorithms based largely on probabilities. But what if actuarial policies could be issued, let’s say for a trucking company, based on actual miles logged, violations, number and age of drivers, preferred routes, training courses, driver experience and other real-world data in a precise, dynamic way? That detailed information, some of which is structured but most of which is not neatly organized in a single database. Enterprise Search integrates all of the data and extracts insightful information in near real-time.

Or, in order to better manage risk, imagine if the insurance company could put into Hadoop information about every claim ever made – billions and billions of unstructured documents – then run a query across all of that information to uncover trends that indicate fraudulent activity. By sharing those findings with their actuarial and analyst groups, the insurer would be prepared to watch for precursors of fraudulent activity and stop the fraud before it happens. Much of the data needed to make these processes a reality exists, but it is Dark and resides on paper and in disparate data sets. Utilizing the Hadoop Distributed File System and Enterprise Search technology, actuaries can run queries across all that data to attain the historical picture necessary to create custom, more profitable policies.

Shedding new light on Dark Data


The potential of this living, breathing data is industry changing when Enterprise Search is incorporated into the Big Data applications. Search opens a whole new world to users by enabling the discovery and analysis of content within Hadoop, transforming that data into real business decision making power for the entire organization. When you look at the scope and value of the information contained in Dark Data, it is pretty easy to understand why progress will continue to occur. The potential of the data to shape markets and catapult companies beyond competitors is too great to allow it to sit unused, collecting dust.

Dark Data will, ultimately, see the light.

About the Author

Paul Doscher is passionate about his belief that enterprise search is the enabling technology that will allow companies to realize the true value from their Big Data (both structured and unstructured). As CEO, Paul is responsible for LucidWorks’ vision and success in the enterprise-wide search, discovery and analytics market. He comes to LucidWorks with 30 years of sales, marketing and business management experience within high-tech enterprise software. Prior to LucidWorks, Paul held the position of CEO for Exalead Inc, a global provider of enterprise search, where he led the company from 2008 through 2011. In 2003, Paul became CEO and one of the principal founders for Jaspersoft, one of the top commercial open source business intelligence platforms in the market. In 2000, Paul joined VMware as the company’s EVP of worldwide field operations where he defined and executed the distribution strategy that formed the basis of the company’s world-wide success. Earlier in his career, Paul held positions within companies including General Manager of Americas for Business Objects (Now SAP), Vice President of worldwide Marketing and Business Development for Entrust and Vice President of US channels for Oracle.