UPDATED 09:00 EST / NOVEMBER 06 2025

AI

Exclusive: Lemony says its dynamic prompt routing tool cuts AI costs by up to 85%

Lemony.ai, the operating name of Uptime Industries Inc., today is releasing an open-source tool that it says can cut artificial intelligence application development costs by dynamically routing prompts to the most cost-effective language model available.

Cascadeflow aims to help developers reduce application programming interface spending without compromising quality or performance. Most developers hardcode large language models for every query, according to Sascha Buehrle, Lemony’s co-founder and chief executive. “Cascadeflow lets developers run smarter, not bigger, by dynamically choosing the right model for every task,” he said.

The software routes each prompt through a cascading pipeline. It starts by using a small, inexpensive model and then evaluates the result against configurable quality metrics such as completeness and correctness. If the output falls short, the software escalates the prompt to a larger model. This approach, known as speculative execution, aims to mitigate the cost of using flagship models for each prompt.

The software tracks token usage and costs across models and providers, offering configurable budget controls and per-query spending caps. Developers define their own pricing in a local cost file to account for differences in provider contracts.

Buehrle said initial benchmarks indicate that up to 85% of prompts can be processed using smaller or domain-specific models. “You don’t need a flagship model to answer ‘what’s 2 plus 2,’” he said. Lemony’s principal business is providing on-premises edge devices running scaled-down language models (pictured).

Broad model support

Cascadeflow initially supports commercial models and processors from OpenAI LLC, Anthropic PBC, Hugging Face Inc., Groq Inc., Together Computer Inc. and the open-source vLLM and Ollama. It also integrates with the Python-based LightLLM, enabling access to approximately 100 additional language models. The software can be used in cloud environments, on local machines or edge devices.

“You can run it wherever your AI application runs,” Buehrle said. “It adds only two milliseconds of latency to your AI stack.”

The software can be deployed with agent frameworks, is compatible with the Model Control Protocol and supports batch processing, streaming and caching optimizations for various providers. It integrates with n8n, a low-code automation platform widely used to build agent workflows.

Buehrle said the company chose to release cascadeflow as open source to build community engagement and transparency into the cost control process. “It’s important to push the core of Lemony out as open source,” he said. “It’s important to build a community and to get from the companies using it.”

Cascadeflow is available beginning today on GitHub.

Photo: Lemony.ai

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.