UPDATED 10:19 EDT / JANUARY 11 2012

NEWS

SAS Institute Adapts to the Big Data Era

The original SAS software package, which debuted over 35 years ago, was designed to run on IBM mainframes. A lot has changed in the world of IT since then, and SAS has evolved to keep up.

The latest stage in SAS’s evolution is a re-architecting of its software to run optimally in distributed computing environments. Between Hadoop and next-generation data warehouses, business analytics increasingly takes place against the backdrop of Big Data architectures, and SAS knows that’s where it has to be.

For SAS, the latest journey began around two years ago, according to Paul Kent, Vice President of Platform Research and Development at the Cary, N.C.-based firm. That’s when SAS teamed up with Teradata to provide SAS analytics inside the massively parallel enterprise data warehouse. Since then, it has forged similar partnerships with IBM Netezza, EMC Greenplum and Aster Data (since acquired by Teradata.)

The shift to parallel computing was heralded by Google, which pioneered the practice of stringing together lots of commodity blades to form a single super-computer, Kent points out. That, in turn, required SAS to rewrite its software and algorithms to run on multiple nodes simultaneously, Kent said, an effort that is still ongoing. But the impact on users is significant.

Namely, in-database analytics obviates the need to move data between the data warehouse and a separate analytic engine or application, such as SAS. This means users spend less time moving data around and more time analyzing it.

One SAS customer, a large national retail company, for example, reduced the amount of time it was spending running marketing optimization analytics from one week (170 hours, to be precise) to three minutes or less, Kent said. The retailer can now take an iterative approach to analytics, rather than running just one time-intensive job to support a week’s worth of marketing objectives.

Paul Kent, Vice President of Platform Research and Development, SAS Institute

In-database analytics also makes it possible run analytics on full data sets, rather than samples. Moving large data sets to and from systems is impractical, so admins often end up transferring smaller, more manageable sample data sets for analysis. They sometimes then run analytics on the sample data to ensure it is representative of the complete data set, before running the actual analytics, Kent said, taking up even more valuable time.

With “the math inside the machine,” as Kent puts it, those steps are no longer necessary.

The ability to run analytics on complete data sets is particularly important when it comes to predicting future events based on historical trends. Take a mortgage lender evaluating risk, for example. If it relies on just sample data from a two-year period during a recession to score applicants, it could miscalculate likely default rates and deny loans to otherwise qualified people during more prosperous economic times.

But the benefits of in-database analytics for users mean new challenges for SAS. It requires SAS engineers to understand how to best partition data across clusters of commodity storage for optimum performance, Kent said. And they must make the transition for customers as seamless as possible, he added. Both jobs are works in progress, he said.

Then there’s Hadoop. SAS has yet to bring its analytic prowess to the open source Big Data framework, but the company is poised to release three Hadoop connectors – one each for Cloudera, Hortonworks and MapR – in the near future, Kent said. SAS also adds its capabilities to other data warehouses, such as ParAccel and HP Vertica, depending on the level of customer interest.

It’s all just part of the latest evolution of SAS, this time for the Big Data Era.


A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.