UPDATED 16:01 EDT / JULY 18 2013

Endgame Gets Piggish With Malware Data Samples At Black Hat

In just over a week, Black Hat USA 2013 will be convening at Caesar’s Palace in Las Vegas. In this series, intended to preview many of the talks and presentations scheduled for the event, SiliconANGLE will focus on the exploitative vulnerabilities associated with big data and how those vulnerabilities can be limited.

Today, we are looking at the enormous amounts of data related with malware samples received by organizations whose bread and butter is built around preventing that data from reaching their customers.

Zachary Hanif, Telvis Calhoun and Jason Trost of Endgame will offer insight not only on how to prevent malware data from filtering through to their users but also how analysis of that data can help in preventing future attacks.

Hanif, a Senior Researcher with Endgame works specifically on creating powerful analytics within both batch and real time data processing engines through applied statistics and rapid correlation. His focus is aimed at applications of machine learning and graph mining associated with massive security data.

Working with Hanif, Calhoun is a software engineer with Endgame. His background revolves around commercial security. In fact, while finishing his M.S. at Georgia State University, Calhoun was a member of the Communications Assurance and Performance Group. Calhoun has previously published research on wireless security.

Rounding out the presentation team is Jason Trost, Software Engineer for Endgame with an interest in Big Data/cloud computing and Machine Learning. His focus currently is on building highly scalable systems meant for processing, analyzing and visualizing high speed network/security events in real-time. This includes drawing analytics from massive amounts of malware.

The abstract of the presentation points out how Endgame has, in the previous 30 months, has received some 20 million samples of malware that translates to nearly 9.5 terabytes of binary data. McAfee and VirusTotal have also received exceptionally high amounts of malware data samples.

While malware is regarded as a nuisance for both security providers and users alike, the presentation will highlight the opportunities presented by this incoming data, especially as it applies to machine learning. They will show how Endgame has performed static analysis on malware so they might extract specific feature sets used for performing large-scale machine learning.

Research into malware has previously been performed by reverse engineers. This means existing malware analysis tools typically only process single binaries or multiple binaries on a single computer. This limitation leaves these tools unable to process terabytes of malware simultaneously. If an organization has wanted to address these issues at scale, they have been left to their own devices to develop their own solutions.

The presentation team will explain how their initial attempts at addressing this mountain of data associated with malware were not very effective at scaling well with the increasing flood of samples. Simply stated, the data was coming in too quickly to analyze properly. Endgame has, over the past two years, worked to refine their system into a dedicated, Hadoop-based framework. This allows their large-scale studies to become easier to perform. Additionally, the processes are more repeatable over an ever-expanding dataset.

Their open framework, BinaryPig, is their solution to this problem. The team will provide example uses of their framework and how it can perform a multiyear, multi-terabyte, multimillion-sample malware census. BinaryPig is built over Apache Hadoop, Apache Pig and Python. This allows BinaryPig to address many of the issues associated with scalable malware processing, including dealing with increasingly large data sizes, improving workflow development speed and enabling parallel processing of binary files with many of the pre-existing tools available. Additionally, BinaryPig is modular and extensible. This was done so security researchers and academics could adapt their systems to the ever-increasing amounts of malware.

The team plans to release some example applications as open source at this conference after they have demonstrated their results and the techniques used to derive them. This presentation will be offered at 5:00pm on Wednesday, July 31.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU