UPDATED 12:37 EDT / JULY 27 2018

AI

How ‘AIOps’ is optimizing cloud computing up and down the stack

Artificial intelligence workloads are consuming ever greater shares of information technology infrastructure resources. AI is also taking up residence as an embedded component for managing, monitoring, scaling, securing and controlling IT infrastructure.

Increasingly, this emerging IT management paradigm goes by the name “AIOps.” This buzzword refers to two aspects of AI’s relationship to cloud infrastructures and operations. On the one hand, it refers to AI as a growing workload that infrastructure and operations are being optimized to support.

On the other, it refers to AI’s use as a tool to make infrastructure and operations more continuously self-healing, self-managing, self-securing, self-repairing and self-optimizing. In this regard, AI’s growing role in IT infrastructure management stems from its ability to automate and accelerate many tasks more scalably, predictably, rapidly and efficiently than manual methods alone. Without AI’s ability to perform continuous log analysis, anomaly detection, predictive maintenance, root cause diagnostics, closed-loop issue remediation and other critical functions, managing complex multiclouds may become infeasible or cost-prohibitive for many organizations.

Acknowledging these challenges as well as the potential productivity benefits from embedding AI in infrastructure, more enterprise IT professionals are exploring the growing range of AIOps platforms and tooling on the market. Many vendors have introduced sophisticated offering that that embed machine learning and other AI tools for intelligent, adaptive, 24×7 operation.

Wikibon sees the AIOps market segmenting into two broad solution categories:

  • AI-workload-optimized computing platforms
  • AI-augmented infrastructure optimization tools

AI-workload-optimized computing platforms

AI-optimized application infrastructure is one of today’s hottest trends in the IT business.  More vendors are introducing IT platforms that accelerate and automate AI workloads through pre-built combinations of storage, compute, and interconnect resources.

At a hardware level, AI-ready storage/compute integration is becoming a core requirement for many enterprise customers. As I discussed here a few months ago, Pure Storage Inc., a well-established provider of all-flash storage platforms, now offers AIRI, an integrated hardware/software platform for distributed training and other compute- and storage-intensive AI workloads. The product is purpose-built for a wide range of AI pipeline workloads, ranging from upfront data ingest and preparation all the way through modeling, training and operationalization.

AIRI integrates into just over a half-rack Pure Storage’s FlashBlade storage technology with four Nvidia Corp.’s DGX-1 supercomputers that run the latest Tesla V100 graphics processing units. AIRI’s storage and compute are interconnected through Arista 100GbE switches that incorporate Nvidia’s GPUDirect RDMA technology, thereby providing a direct path for high-speed, high-volume data exchange on distributed training and other AI workloads. The solution also incorporates Pure Storage’s AIRI Scaling Toolkit as well as Nvidia’s GPU Cloud deep learning software stack, a container-based environment for TensorFlow and other AI modeling frameworks.

As another proofpoint, consider how Dell EMC has configured AI into its hyperconverged infrastructure products to scale and accelerate AI workloads on multiclouds, as well as within premises-based “true private clouds” that approximate the public cloud experience. As I discussed here recently, Dell EMC recently made several important product announcements, rolling out preconfigured infrastructure products that accelerate and scale AI workloads as follows:

  • AI-optimized hardware that manages complex workloads in hyperconverged infrastructure: The company recently unveiled a new version of its Dell EMC VxRail hyperconverged infrastructure appliance product. At the heart of the new appliance are Intel Corp.’s new Xeon chips and Nvidia’s Tesla P40 graphics processing units for more intensive AI, computer-aided design and other workloads. In addition to combining compute and storage to simplify information technology operations, the VxRail integrates with the vSAN and vSphere virtualization software from Dell’s majority-owned VMware Inc. It also supports new NVMe cache drive options for lower-latency performance, as well as 25GbE networking to deliver more aggregate bandwidth.
  • AI-optimized chipsets in hyperconverged server platforms: Dell announced new versions of its Dell EMC PowerEdge servers. The new PowerEdge R840 and PowerEdge R740 are both four-socket servers that embed Intel’s Xeon Scalable processors, Intel field-programmable gate arrays, Nvidia GPUs and Xtremio flash drives. Dell also unveiled a new version of the Dell EMC VxRack SDDC hyperconverged appliance that incorporates the vendor’s new PowerEdge servers and comes bundled with version 2.3 of VMware’s Cloud Foundation software for on-premises or as-a-service deployment in public clouds that come optimized for AI, machine learning and in-database analytics workloads. The company also provided a preview of its forthcoming PowerEdge MX modular infrastructure, which will enable customers to configure their hyperconverged infrastructure flexibly, optimizing compute capacity, acceleration cards, memory and I/O connectivity for distinct fine-grained workloads.

In a market where more AI workloads are being automated every step of the way, Wikibon expects that workload-optimized hardware/software platforms such as this will find a clear niche for on-premises deployment in enterprises’ AI development shops. Before long, no enterprise data lake will be complete without preoptimized platforms for one or more of the core AI workloads: data ingest and preparation, data modeling and training and data deployment and operationalization.

aiops-graphics

AI-augmented infrastructure optimization tools

AI is becoming an essential tool for accelerating, scaling, automating and otherwise optimizing infrastructures at every level. AIOps solutions enable these benefits by driving real-time monitoring, predictive analysis, root cause diagnostics and anomaly detection on system- and application-level events in IT infrastructure, and also in data, application and services at higher layers in the cloud computing stack.

At the software level, we can see the AIOps trend most clearly in how large vendors such as Oracle Corp. and IBM Corp. have made AI the foundation for the “autonomous” or “autonomic” features in their product platforms. We can also see it in the diverse range of IT operations management tool vendors such as Moogsoft Inc., BMC Software Inc., Extrahop Networks Inc., and AppNomic Systems Inc.

At the hardware level, we can see the AIOps trend most clearly in recent announcements such as Hewlett Packard Enterprise Co.’s addition of AI-based systems management capabilities to its 3PAR StoreServ all-flash storage arrays. Specifically, HPE has embedded InfoSight — its cloud-based AI platform for storage management — into the 3PAR system, which enables users to spot issues before they happen and take action to remedy them. This enhancement enables HPE 3PAR customers improve the availability, performance and utilization of existing storage resources and workloads in on-premises data centers and private cloud environments. Once upgraded with InfoSight, HPE 3PAR all-flash storage systems leveraged embedded AI for the following IT infrastructure management capabilities:

  • Accelerated storage application DevOps: InfoSight uses AI to accelerate DevOps for cloud, virtualized, and containerized applications that leverage 3PAR storage arrays. It does through integration of machine learning with prebuilt blueprints written in the Chef, Puppet, and Ansible configuration management tools.
  • Automated storage capacity deployment: Infosight uses machine learning to drive prebuilt automated workflows for fast deployment and optimization of 3PAR storage resources.
  • Predictive storage optimization: InfoSight uses AI to power predictive analytics, anomaly detection and root cause analysis that enable 3PAR storage arrays to grow smarter, more available and more reliable over time. It does so through use of storage-embedded AI to monitor storage utilization trends and anomalies, predict performance and capacity issues, perform root cause analyses and drive automated issue resolution.

Other recent announcements of AI as an IT infrastructure management accelerator came from Dell EMC:

  • AI that optimizes storage within hyperconverged environments: Dell EMC announced a new version of its Dell EMC PowerMax storage array, a rearchitecture and renaming of its flagship VMAX enterprise product line. The new PowerMax incorporates nonvolatile, memory-based architecture for extreme low-latency performance on real-time AI, IoT, mobile and other use cases. It supports end-to-end NVMe-over-Fabrics and high-speed, low-latency Storage Class Memory. It also embeds real-time ML engine that uses predictive analytics and pattern recognition to drive intelligent storage automation. The ML engine is leverages data from Dell EMC’s installed base of VMAX3 and VMAX all-flash capacity for its storage intelligence. The vendor also unveiled a new version of its XtremIO all-flash arrays, which incorporate new data replication capabilities that only sending unique or new data to a remote site, thereby reducing bandwidth and storage requirements.
  • AI that configures hyperconverged hardware for application acceleration: Dell EMC announced a new version of its hardware system optimizer that leverages AI for on-the-fly system configuration to boost the performance of specific software applications. Available now, Dell Precision Optimizer 5.0 uses data from an application’s background behavior to train machine learning models that can then automatically adjust system configurations such as CPU, memory and storage to provide the most optimal settings.

AI is also becoming a key element in how organizations manage compliance with the General Data Protection Regulation and other privacy mandates. The core of AI’s role in GDPR compliance is in its use as a tool for discovering, organizing, curating and controlling enterprise personally identifiable information assets across complex, distributed application environments.

As noted here, Wikibon has seen a surge in products that incorporate machine learning for GDPR compliance in application and data infrastructure. Many of these use cases focus on using machine learning for discovery in distinct business processes, application domains, content formats or data sources:

  • PII discovery in DevOps pipelinesBigID Inc. uses ML to continuously track changes in PII across production and development environments in the data center or cloud. Its BigOps uses ML to discover, contextualize and catalog PII across all data stores. It plugs into open-source DevOps environments such as Jenkins to automatically monitor changes to PII across the development lifecycle. And it uses ML to compare its data with suspected pirate database to determine rapidly where there has been a breach that requires prompt notification.
  • PII discovery in unstructured machine-data logsLoom Systems uses ML to analyze logs and unstructured machine data for immediate visibility into the IT environments. Its Sophie for GDPR has a “find my PII” feature that automates the collection of sensitive log data set, enabling rapid location and deletion of PII, upon data subject request, under the PII “right to be forgotten” mandate.
  • PII discovery at the network levelDB Networks uses ML to discover databases containing PII and automatically map how the information is being processed. Its DBN-6300 performs passive scanning on a network terminal access point rather than using active scanning, which can miss undocumented databases. It is available as a physical appliance or in an Open Virtualization Format and supports database management systems including Oracle server, Microsoft SQL Server and SAP Sybase ASE. The virtual machine supports VMware vSwitch, dvSwitch and a software-defined network platform configured to allow network tapping.
  • PII discovery across hybrid clouds: Informatica LLC provides an ML-driven data discovery and remediation solution that helps enterprises to automatically discover new and existing PII and other data assets across hybrid clouds, identify and mask sensitive data, and perform risk analyses to determine effective courses of remediation. It embeds metadata-driven AI to provide data managers with recommendations for automating and accelerating privacy and security workflows. And it integrates with customers’ investments in existing Informatica solutions, including Enterprise Data CatalogInformatica Data QualityAxon Data Governance and Secure@Source.
  • PII discovery of sensitive data in alphanumeric and pixel–level digital formatsMinerEye uses ML to continuously identify, organize, track and protect PII and other information assets. Its Data Tracker uses ML to sift through enterprise data repositories at a byte level, and even uses computer vision, a form of deep learning, to do so at a pixel level. It can run these scans on archived information at rest of live data streams in real time. It continually tracks vast amounts of PII, using ML to adapt and cover changes in form and file. It can identify and track sensitive data anywhere within the organization or out in the cloud. It can alert enterprise compliance administrators to suspicious data behavior, especially regarding assets of critical importance.
  • PII discovery in virtual enterprise data catalogsWaterline Data Inc. uses  ML to create a constantly updated virtual view of PII and other data stored in databases and other structured data stores within an organization. Its GDPR Data Management Application builds upon Waterline’s existing Smart Data Catalog, which helps business analysts find, organize and classify data without information technology department involvement. The GDPR-specific application assists data privacy officers and data stewards with issues specific to GDPR and other regulations by automatically identifying regulated subject data along with its contextual use and lineage. Integrated access control mechanisms can impose automatic processes to make data-compliant, as well as generate compliance reports and workflows that align with specific GDPR articles. Using ML, the platform can be trained to look for certain types of data, such as a policy or driver’s license number, and discover it across all data sets. The system can assist with risk assessment planning by comparing data types to those covered by GDPR, shortcutting a process that can take weeks in many organizations.

For an excellent discussion of AIOps trends and solution providers, check out this recent interview on theCUBE with Muddu Sudhakar, chief executive and investor at a stealth-mode startup:

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU