Cloudflare announces full-stack platform for deploying AI at scale
Connectivity cloud company Cloudflare Inc. today expanded its artificial intelligence platform offerings for developers with the addition of infrastructure that allows the deployment of AI inference at large scale, vector databases and observability.
As a global company, Cloudflare has built a large network of locations around the globe to house storage and compute, and it’s putting that network to use with today’s launch for AI deployment, Cloudflare Chief Executive Matthew Prince told SiliconANGLE in an interview.
“We’ve literally got people with suitcases full of GPUs running around the world right now, plugging those cards into the boxes that make up our existing network,” he said.
The first offering that Cloudflare is providing to developers is called Workers AI. It will enable them to deploy AI models at the edge at scale in nine cities across three continents at launch. Prince added that 100 cities worldwide in North America, Asia and Europe will be live by the end of the year.
Workers AI provides what Price said is an affordable, serverless way for developers to provide AI inference locally. Inference is when an AI is asked to answer a question, summarize a document, produce an image or when it’s used for other tasks.
Although many AI models can run inference on a mobile device or on a large centralized data center, some are too large to run on devices and running them in a center across the country introduces latency, which slows down response time. Putting them into Cloudflare’s global network places them closer to the end users, which shortens the time that users get their answers.
“I think that the thing that we have, which is really unique, is that we’re close to everyone on Earth,” Prince said. “We run a network that spread all around the world, and we’re also really good at spreading that load across that network. Those two things are uniquely positioned to be able to serve the workloads in the future.”
Equally important, by keeping the workload local, information sent to and from the AI never leaves the locality. According to Prince, this means that businesses using Workers AI can stay more compliant with local laws and regulations regarding the handling of private information as they don’t need to worry about data leaving a jurisdiction.
Prince likened it to the “Three Little Bears,” and said that finally there’s a third place to run AI workloads. On-device is nice, but some models are too big. Public cloud is powerful, but too far away and introduces compliance issues. “We think that Cloudflare ends up being the porridge that’s just right, between being too small and being too centralized,” he said.
Developers will also not need to worry about the underlying infrastructure or launching their own virtual machines, the entire system is serverless. They can load models that work with the system from a model catalog and get started quickly, including for large language models, speech-to-text, image classification and sentiment analysis, among other tasks.
To provide AI models, Cloudflare partnered with Hugging Face Inc., a company that develops tools for building AI open-source applications. It will offer open-source generative AI models optimized for the company’s AI inference platform that developers can simply deploy. Cloudflare also worked with Meta Platforms Inc. to optimize its open-source Llama 2 large language model to run on the Workers AI platform so developers can deploy it.
“As enterprises look to maximize their operational velocity, more and more of them are turning to artificial intelligence,” said Stephen O’Grady, principal analyst with RedMonk. “But it’s critical to deliver a quality developer experience around AI, with abstractions to simplify the interfaces and controls to monitor costs. This is precisely what Cloudflare has optimized its Workers platform for.”
Vector databases for full-stack AI applications
Cloudflare also introduced Vectorize, a new vector database that enables developers to build full-stack AI applications entirely on Cloudflare by allowing them to use the “embedding” from models, the searchable representation of AI training data, and then query and cache it. With Vectorize and Workers AI, developers no longer need to use separate tools to build their AI apps, they can do it all in the same platform – and all the processing and storage happens closer to users.
Prince mentioned that one thing many customers said is that the ability to understand what AI was up to has been difficult and they don’t know how to optimize it. So Cloudflare created AI Gateway that makes AI applications observable and scalable on the network.
Using AI Gateway, developers will be able to see queries, understand where traffic is going, such as the number of requests, the number of users, costs and duration. That will help developers decide if a request should be routed to a less expensive model such as GPT-3 instead of GPT-4, or if there’s a security problem such as a malicious user sending numerous requests that are choking the network, requiring rate limiting.
The same system also uses Cloudflare’s existing infrastructure to cache answers and requests, which means that if users ask the same question of the AI model, it’s possible to avoid querying the AI repeatedly and reply with the cached answer. Not having it run the same question repeatedly greatly reduces the cost of running the AI.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.