More than the newest hot stock after its successful initial public offering today, Cloudera Inc. has an opportunity to take advantage of a once-in-a-generation shift to a new platform for applications that inform or automate decisions with ever-better and ever-faster answers.
But investors and customers may be getting ahead of themselves if they assume the company is ready anytime soon to seize that opportunity, a market likely to be worth tens of billions of dollars in a few years. Competing for that market will require success in three emerging areas for Cloudera’s platform: hybrid cloud migration, machine learning and operational databases.
Of the three areas, hybrid cloud and machine learning are high-risk. Cloudera will face competition from leading cloud providers Microsoft Azure, Amazon Web Services and Google Cloud Platform, each of which has deep advantages.
Let’s start with a closer look at why hybrid cloud migration is high-risk. Core to Cloudera’s value proposition to date has been its Manager and Navigator products that simplify management of the various open source components in its platform. With these management products on Azure, AWS and GCP, Cloudera can enable customers to move workloads easily between their own data centers and those of the different cloud vendors. From the S-1: “Our customers [can] deploy, configure and monitor all their workloads at scale across these environments from a ‘single pane of glass.’” What’s not to like? None of the cloud vendors can offer that manageability and portability across customer data centers and different clouds.
Making this hybrid cloud offering work requires that customers keep their applications contained within Cloudera’s platform. But once on a public cloud platform, really strong centrifugal forces start to pull the platform apart. Services native to different clouds introduce an entirely new set of competitive offerings. Customers may want to take advantage of Azure’s IoT services such as Event Hubs and Stream Analytics. Or they may want to use a data warehouse that only works on AWS. Redshift is AWS’s native data warehouse and Snowflake is a third-party data warehouse optimized for machine data that operates with the simplicity of a SaaS application (figure 1, below).
Once customers start to leverage services native to different platforms, the value of Cloudera’s hybrid cloud management and migration begins to fray.
Now customers need to manage their cloud native services with tools either from third parties or from the cloud vendor. Cloudera could expand its management tools’ coverage to include these native services, but customers would still lose the ability to migrate both across cloud platforms and their own data centers. Not yet visible but just over the horizon are management tools based on machine learning that can make multiple services operate more like a single SaaS application. The public cloud vendors — as well as third parties — can use operations data from these multivendor cloud services to train machine learning models to do many of the tasks that Cloudera’s own management software provided for its services. In other words, integrating with services managed by one cloud becomes still more attractive.
In short, Cloudera has a good argument for persuading customers to stay within its platform in order to preserve portability and avoid lock-in to any one cloud. But customers will face strong incentives to leverage native cloud services and that represents a risk to this key part of Cloudera’s strategy.
The second big risk relates to how well Cloudera can ride the machine learning wave. Machine learning and AI are becoming the heart of emerging applications. And Hadoop vendors have a privileged position in supporting these new applications by virtue of the huge amounts of data customers have been collecting in their Hadoop repositories. The promise is that Cloudera can provide tools that combine data scientists’ familiarity of working with desktop tools such as those based on R or Python with the scalability, security and manageability of a Hadoop cluster running Spark. It’s a compelling value proposition.
But there’s an obstacle that will be familiar to many Hadoop customers: scarce skills. Many Hadoop customers, especially outside the largest and most sophisticated shops, have faced challenges getting their on-premises deployments into production because it’s hard to manage disparate components that weren’t all designed to fit together. Even though machine learning tools are getting easier to use, building the models still requires data scientists. And these skills may be even more scarce than those required to administer Hadoop.
As with hybrid cloud migration, public cloud platforms appear likely to upend all assumptions about building machine learning applications. They have an advantage that traditional enterprise software vendors can’t currently match. They have data, and lots of it. This data comes from running their cloud platforms as well as their other cloud businesses. For example, mainstream developers can create conversational user interfaces using machine learning models on Azure that Microsoft trained as part of operating its Cortana assistant and Bing search engine. Google offers similar capabilities. A few years ago, only highly specialized data scientists who understood deep learning with artificial neural networks could even attempt to build conversational user interfaces (figure 2, below).
But the cloud vendors have advantages in many more areas too. For example, Amazon is working to extend to AWS much of the machine learning and AI technology that it uses to run its ecommerce business. All the data it has processed to figure out how to do merchandising and recommendations in different verticals; demand forecasting and price optimization; and fulfillment and logistics, among other tasks, will be services accessible to mainstream developers.
To be fair, most developers will have to do additional work to customize these services to their particular applications. But more than any data science tools imminently on the horizon, these types of application-level services, available from all the cloud vendors will empower mainstream application developers. These services will exert the most powerful draw onto the cloud vendors’ application platforms, pulling in both on data currently in Hadoop repositories and new data accumulating in the cloud.
Cloudera is a hot stock because it is a high-growth enterprise software company with an immense opportunity. Institutional investors currently have a big appetite to fund growth opportunities even at the expense of profitability. But investors and customers should be aware that seemingly distant competitive threats, like the objects in your rear-view mirror, are closer than they appear.