UPDATED 11:36 EDT / JUNE 07 2013

NEWS

How Accumulo Safeguards Your Civil Liberties

It’s been widely reported that the NSA is in the midst of collecting huge volumes of call metadata from Verizon associated with all domestic and international calls made by the company’s customers for three months starting in mid-April (see the court order here.) Less attention has been paid to what exactly the government does with all that data or the technology supporting it.

While details are sketchy (neither the NSA nor the White House will even acknowledge the existence of the program), it is important to take a step back and understand that the NSA cannot indiscriminately analyze, mine or otherwise explore this vast new trove of data. In order to analyze the data at hand, the NSA must get a court order justified by the reasonable suspicion of an imminent terrorist act.

Even then, the NSA may only access and analyze segments of the call metadata that relate specifically to the potential threat in question and only specific individuals within the NSA can access the data. Among those individuals, access levels vary based on legitimate need to see and analyze the specific data sets covered by the court order.

There are three points I want to make:

1. Assuming that you agree that the federal government should be allowed to mine communications data such as call metadata to investigate legitimate terrorist threats, then from a purely practical perspective it makes sense for the NSA to collect such data in advance. Verizon Wireless alone has somewhere north of 75 million wireless subscribers. It would be next to impossible for NSA agents to collect, integrate and analyze that much data (75 million callers x multiple calls by each caller per day x weeks or months = Big Data) in a moments notice. By collecting the data ahead of time, the NSA is able to load the data into its platform and have the data ready for analysis when it obtains a court order justified by a legitimate threat. Consider that the NSA is likely collecting similar call metadata from other wireless providers, which all needs to be merged in order to “connect all the dots.” Slapping together a database with call metadata from hundreds of millions of callers is not something you do overnight.

2. Once a court order for analysis is obtained, the NSA needs the technical capabilities to limit their analysis to the data sets in question and to control which agents have access to the data. This is where we get to the technology behind the data mining program. While we don’t know for certain, the NSA is almost definitely using Accumulo, its homegrown scale-out NoSQL database, to process, store and analyze the call metadata. The NSA developed Accumulo several years ago when it couldn’t find a database that met its stringent requirements. Among those requirements were fine-grained security and access controls. Accumulo, which is often run on top of Hadoop and is based on Google BigTable, was built from the ground up with cell-level security capabilities. This allows, among other capabilities, administrators to grant data access on a cell-by-cell and user-by-user basis, rather than being forced to provide users “all-or-nothing” access. Accumulo is the only scalable database I have come across that offers this capability, without which the NSA would not be able to perform this data analysis and follow the law. There is one company out there that is seeking to commercialize Accumulo, Sqrrl, and according to their website they are seeking to bring these cell-level security capabilities to other industries, such as healthcare and finance.

3. As for the type of analysis the NSA performs on the call metadata, graph analysis is for sure one such type. Graph analysis allows you to visualize and uncover relationships between distinct entities hidden among large volumes of data. The resulting visuals are made up of nodes, which represent the entities, and edges, the lines that connect and represent the relationships between the nodes. Graph analysis is a popular way to better understand the dynamics of social networks (and is the basis of Facebook’s Graph Search, rolled out earlier this year) but is equally effective when trying to ferret out terrorist networks. And we know that the NSA has successfully tested Accumulo’s graph analysis capabilities on some huge data sets – in one case on a 1200 node Accumulo cluster with over a petabyte of data and 70 trillion edges.

Slide from May 2013 presentation by the NSA on its use of Accumulo for graph analysis.

The types of workloads in question are not just for intelligence and security agencies either. Specifically, fine-grained access control, I believe, is a critical feature for Big Data platforms in the enterprise. This is especially true as Big Data experiments and proof of concepts graduate to production-grade deployments, and as Hadoop adds support for additional computational models beyond MapReduce. As Hadoop becomes more robust and easier for non-MapReduce experts to use (such as by adding SQL-like and search capabilities), more and more users in the enterprise will interact with the platform. Not all users are created equal, and enterprises will need to implement fine grain access controls that restrict data access based on role, security authorization and other criteria.

But back to the matter at hand. Balancing national security against the protection of civil liberties is obviously a difficult challenge. And it’s a conversation we as a nation must continue having. It is important to realize, however, that the technology used to identify threats and keep us safe are maturing and developing rapidly. Technology is not a cure-all to the security v. civil liberties challenge, but as the tools become more sophisticated it becomes easier to zero-in on the bad actors without compromising the privacy of the rest of us.

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

How Accumulo Safeguards Your Civil Liberties

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

How Accumulo Safeguards Your Civil Liberties

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies