One of the most puzzling developments in big data is the analytics piece — we’ve figured out some useful ways to store and access hordes of data, but how does one efficiently get analytics tools onto the server? Having ready access to data is crucial to real-time analytics in particular, where an in-memory solution is necessary. It’s a problem William Bain’s been working on for a while, leaving Microsoft several years back to launch his own company, ScaleOut Software.
Featured in today’s CEO Series, Bill Bain discusses the key differentiators for his product, as well as the importance of layering more analytics at the server level of the stack. ScaleOut is steadily improving its analytics offerings, launching an Analytics Server to hasten real-time analytics just last month.
Positioning ScaleOut for a widening market, Bain sees innovation as an important factor to balance against customer needs. The datacenter itself is in the midst of one of the largest transitions we’ve seen in history, and efficiency remains a pillar for server architecture undergoing the necessary changes to address the needs of tomorrow. This is another topic Bain discusses in depth, noting the emergence of parallel computing as a mainstream development that’s outlined the importance of scalable computing for business optimization today.
What’s the most important parallel computing technique you’ve learned and applied to ScaleOut Software?
Picking a single technique is challenging, but I’d say that making all aspects of a parallel computing system scalable is both difficult to do and critical for performance. The introduction of any operation that is not scalable creates a bottleneck for the entire system and diminishes total throughput. We’ve worked hard at ScaleOut to apply more than 30 years of experience in parallel computing architecture to ensure that every aspect of our products scale as load increases.
How has adding an analytics layer (with the ScaleOut Analytics Server) to your core product better positioned you in the market?
The addition of analytics to our in-memory data grid (IMDG) broadens our market tremendously. Now our existing customers can quickly and easily analyze fast-changing data already held in a ScaleOut IMDG, for example, stock positions, travel reservations, shopping carts, and portfolio data. Also, new customers who are mainly interested in analytics now can use ScaleOut Analytics Server as a fast, easy to use, in-memory computing platform for analyzing their data with near real-time responsiveness. This provides an alternative to more complex, file-based platforms, such as Hadoop, and is well suited either for fast changing data sets which need continuous map/reduce or for fast “what-if” analysis of more static data sets that can fit in memory.
In what ways will data analytics change the market at the architecture level?
Data analytics has brought parallel computing into the mainstream and has demonstrated the immense value of applying scalable computing techniques to optimize business processes. I think the use of data analytics powered by parallel computing increasingly will become an integral part of mainstream applications that manage large or fast-changing data volumes.
How has mobile impacted the need for scale out technologies?
As mobile devices have proliferated and become powerful, providers want to deliver more functionality to users. Much of this functionality makes use of fast-changing data like reservations, gaming, weather, messages, and many more. Storing and quickly processing this data to support large numbers of mobile devices requires a highly scalable server-side architecture. IMDGs play an important role in delivering the scalability that these applications need to keep users happy. For example, we have a customer who provides mobile apps to their customers using a SaaS application and uses our IMDG to store fast-changing data accessed by mobile devices; this gives them both scalable performance and fast response times.
Anticipating for growth in business requires the right data and the right products. What gives you confidence in the future?
ScaleOut Software has relied on a highly customer-driven process to identify new product features. We now have more than a seven year history of providing IMDG products to our customers, and in that time we have worked with hundreds of customers to assess their needs and support their use cases. This experience has enabled us to build a sizable store of knowledge about our market and to anticipate what to build next to meet their needs.
At the same time, the key to entrepreneurship is to balance our customers’ feature requests with innovation that anticipates how the market will evolve. We believe that the market for big data analytics has arrived because it offers tremendous benefits in maximizing the value of our customers’ data. Making analytics easy to use, fast, and a seamless part of scalable applications offers important new capabilities for our customers, and that is our reason for creating ScaleOut Analytics Server.
Biggest misconception about data farms & scalability?
Most people assume that adding more servers to an application (often called “scaling out”) will help it run a fixed size workload faster, but this is rarely the case. The key to scaling out an application is to add servers to handle a growing workload in the same amount of time as the smaller workload. This ensures that users never see degraded performance even if the workload gets very large, and that’s what most applications actually need.
Another misconception is that parallel computing is inherently complex. While it is true that some technologies, such as multi-threading and distributed synchronization, can be difficult to understand and implement correctly, data-parallel computing can be very straightforward to grasp and use. Our approach to IMDG-based data analytics is to keep the computing model as simple as possible while delivering high performance. Keeping it simple is an important lesson from three decades of parallel computing.
Top 3 trends you’re watching?
The rapid adoption of big data analytics is the key technical trend I watch every day. I think we will see much tighter integration of in-memory computing with fast SSD storage and scalable, NoSQL persistent stores over the next few years. The optimum way to combine these technologies and to hone the parallel computing model that uses them is still evolving at a rapid rate.
Another trend that I am watching is the emergence of scalable infrastructures that simultaneously optimize the combination of performance and energy usage, especially within data centers. Today’s data centers optimize for high availability at the expense of unsustainable energy usage. Scalable computing techniques and server virtualization give us the tools we need to create new infrastructures that dynamically optimize both of these competing needs. This technology will become vitally important in the coming years, especially as fuel prices climb and climate change accelerates.
Most satisfying project?
Without a doubt, it is the raising of our daughter. Everything else pales in comparison.