UPDATED 09:00 EDT / SEPTEMBER 27 2023

Cloudflare announces full-stack platform for deploying AI at scale

Connectivity cloud company Cloudflare Inc. today expanded its artificial intelligence platform offerings for developers with the addition of infrastructure that allows the deployment of AI inference at large scale, vector databases and observability.

As a global company, Cloudflare has built a large network of locations around the globe to house storage and compute, and it’s putting that network to use with today’s launch for AI deployment, Cloudflare Chief Executive Matthew Prince told SiliconANGLE in an interview.

“We’ve literally got people with suitcases full of GPUs running around the world right now, plugging those cards into the boxes that make up our existing network,” he said.

The first offering that Cloudflare is providing to developers is called Workers AI. It will enable them to deploy AI models at the edge at scale in nine cities across three continents at launch. Prince added that 100 cities worldwide in North America, Asia and Europe will be live by the end of the year.

Workers AI provides what Price said is an affordable, serverless way for developers to provide AI inference locally. Inference is when an AI is asked to answer a question, summarize a document, produce an image or when it’s used for other tasks.

Although many AI models can run inference on a mobile device or on a large centralized data center, some are too large to run on devices and running them in a center across the country introduces latency, which slows down response time. Putting them into Cloudflare’s global network places them closer to the end users, which shortens the time that users get their answers.

“I think that the thing that we have, which is really unique, is that we’re close to everyone on Earth,” Prince said. “We run a network that spread all around the world, and we’re also really good at spreading that load across that network. Those two things are uniquely positioned to be able to serve the workloads in the future.”

Equally important, by keeping the workload local, information sent to and from the AI never leaves the locality. According to Prince, this means that businesses using Workers AI can stay more compliant with local laws and regulations regarding the handling of private information as they don’t need to worry about data leaving a jurisdiction.

Prince likened it to the “Three Little Bears,” and said that finally there’s a third place to run AI workloads. On-device is nice, but some models are too big. Public cloud is powerful, but too far away and introduces compliance issues. “We think that Cloudflare ends up being the porridge that’s just right, between being too small and being too centralized,” he said.

Developers will also not need to worry about the underlying infrastructure or launching their own virtual machines, the entire system is serverless. They can load models that work with the system from a model catalog and get started quickly, including for large language models, speech-to-text, image classification and sentiment analysis, among other tasks.

To provide AI models, Cloudflare partnered with Hugging Face Inc., a company that develops tools for building AI open-source applications. It will offer open-source generative AI models optimized for the company’s AI inference platform that developers can simply deploy. Cloudflare also worked with Meta Platforms Inc. to optimize its open-source Llama 2 large language model to run on the Workers AI platform so developers can deploy it.

“As enterprises look to maximize their operational velocity, more and more of them are turning to artificial intelligence,” said Stephen O’Grady, principal analyst with RedMonk. “But it’s critical to deliver a quality developer experience around AI, with abstractions to simplify the interfaces and controls to monitor costs. This is precisely what Cloudflare has optimized its Workers platform for.”

Vector databases for full-stack AI applications

Cloudflare also introduced Vectorize, a new vector database that enables developers to build full-stack AI applications entirely on Cloudflare by allowing them to use the “embedding” from models, the searchable representation of AI training data, and then query and cache it. With Vectorize and Workers AI, developers no longer need to use separate tools to build their AI apps, they can do it all in the same platform – and all the processing and storage happens closer to users.

Prince mentioned that one thing many customers said is that the ability to understand what AI was up to has been difficult and they don’t know how to optimize it. So Cloudflare created AI Gateway that makes AI applications observable and scalable on the network.

Using AI Gateway, developers will be able to see queries, understand where traffic is going, such as the number of requests, the number of users, costs and duration. That will help developers decide if a request should be routed to a less expensive model such as GPT-3 instead of GPT-4, or if there’s a security problem such as a malicious user sending numerous requests that are choking the network, requiring rate limiting.

The same system also uses Cloudflare’s existing infrastructure to cache answers and requests, which means that if users ask the same question of the AI model, it’s possible to avoid querying the AI repeatedly and reply with the cached answer. Not having it run the same question repeatedly greatly reduces the cost of running the AI.

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Cloudflare announces full-stack platform for deploying AI at scale

Vector databases for full-stack AI applications

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Cloudflare announces full-stack platform for deploying AI at scale

Vector databases for full-stack AI applications

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026