NEWS
NEWS
NEWS
As if analyzing unstructured data wasn’t already difficult enough, organizations must also keep track of where their records came from and how they’ve been modified due to various logistical reasons. To preserve the necessary context about the hundreds of terabytes of information it’s ingesting every week, LinkedIn Inc. has created a custom auditing platform that it’s sharing with the world today under an open-source license.
Upon setup, WhereHows deploys a monitoring agent to every major stop along an organization’s analytics pipeline that observes the data passing through. Key details such as what kind of work is performed in each system and which department is responsible are synchronized to a centralized repository where they’re organized into a neat timeline. According to LinkedIn, the process produces lineages that make it possible to trace records all the way back to their respective points of origin.
WhereHows exposes the information through a graphical interface that borrows a page from popular commercial analytics tools like Tableau Enterprise to make navigation as straightforward as possible. A worker can look up the dataset they’re interested in examining using a sleek search bar and then visualize its evolutionary path to understand the changes that have taken place over time. If something is amiss, they’re able to consult the dashboard’s documentation section, which provides the ability to view various notes from the other employees who are using the platform.
WhereHows also offers the option to contact a colleague directly when their help is needed with solving a more serious issue like inaccuracies in a dataset. The functionality can come equally useful for sorting out the compliance issues that often arise in large organizations after data from different departments is merged, a problem that becomes particularly important when personally-identifiable customer records are involved.
LinkedIn has been using WhereHows to great effect since its creation roughly two years ago. The social networking giant’s internal deployment contains 15 petabytes worth of lineage information about some 50,000 datasets, along with 40,000 employee notes. The fact that the platform can operate effectively at such a massive scale should strike a chord among traditional enterprises that have been using propriety information governance software until now for lack of a better choice.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.