UPDATED 11:30 EST / DECEMBER 03 2025

AI

AWS simplifies AI agent customization with automated reinforcement learning

Amazon Web Services Inc. wants to solve the efficiency challenges of artificial intelligence agents and reduce their overall inference demands, and it’s tackling the problem with more advanced model customization tools.

The company announced today at its annual customer conference AWS re:Invent that it’s introducing a new Reinforcement Fine Tuning or RFT feature in Amazon Bedrock, and serverless model customization capabilities in Amazon SageMaker AI. These new capabilities will make it easier for developers to customize AI models using reinforcement learning and potentially increase their accuracy, reliability and efficiency over the base models.

It’s an important capability, AWS said, because when companies create AI agents to automate business tasks, they generally want to base them on the most advanced large language models available. But these models generally have extraordinarily high inference demands, especially when they’re being asked to power AI agents that have to reason through problems and employ third-party tools. That results in agents using massive amounts of processing power, even for simple, routine tasks such as checking calendars and searching for documents.

These tasks can in fact be done reliably using much less powerful models, and that’s why the ability to customize can be so useful for agentic developers, AWS said. Previously, customizing models required extensive machine learning expertise and advanced infrastructure resources and would take months to do, but Amazon says its new features dramatically simplify the task. By making model customization more accessible, enterprises will be able to develop more efficient, customized AI agents that can get by with a lot less processing power.

Model customization with a few, simple clicks

RFT in Amazon Bedrock is being rolled out alongside the new AgentCore capabilities announced yesterday, and makes it simple for any company to apply reinforcement learning to streamline AI models and make them more efficient.

With reinforcement learning, models acquire new knowledge through a trial and error process that’s combined with human feedback. When models display good behavior they’ll be “rewarded,” while bad behavior is “corrected.” This technique rewards not only good answers but also good reasoning processes that increase efficiency, AWS said. The models will remember which behaviors they should apply over time.

Reinforcement learning has been shown to be very effective, but the challenge has always been its implementation. Traditionally, it required a complex training pipeline and massive compute, and either human experts with the time to sit there and provide feedback, or access to a more powerful AI model that can evaluate each model response.

With RFT on Amazon Bedrock, reinforcement learning has been made a whole lot easier, so it can be accessed by any developer, AWS said. Amazon Bedrock is a fully managed AI platform that provides access to high-performance foundation models from dozens of top AI companies, as well as tools for transforming these models into AI agents and generative AI applications.

To undertake reinforcement learning, developers just select the model they want to customize and then point it towards the model’s history of interactions or upload a training dataset. Then they select a reward function, which could be rule-based or AI-based, or a ready-to-use template, and the fine tuning process will be entirely automated by Amazon Bedrock. It eliminates the need for extensive machine learning knowledge – all that’s required is a “clear sense of what good results look like,” the company said.

To begin with, Amazon Bedrock RFT only supports Amazon’s Nova 2 Lite model, but the company promised to add support for dozens of additional models in the coming weeks.

Multiple reinforcement learning techniques

A similar update is coming to Amazon SageMaker AI, which is a more advanced AI machine learning platform that allows companies to design, develop and deploy their own models and customize them. It’s being enhanced now with serverless customization capabilities that promise to accelerate this process dramatically.

Developers will be able to access reinforcement learning in Amazon SageMaker AI through an agentic experience, where a dedicated AI agent guides them through the process, or alternatively via a self-guided approach that allows for more extensive control over the customization process. “With the agentic experience, developers describe what they need in natural language and then the agent walks through the entire customization process, from generating synthetic data to evaluation,” the company said.

Whichever option developers choose, they’ll be able to access multiple reinforcement learning techniques, including learning from feedback, learning with verifiable rewards, supervised fine-tuning and direct preference optimization. At launch, the new features are compatible with Amazon’s Nova family of models, as well as Llama, Qwen, DeepSeek and GPT-OSS.

Simpler, more reliable AI training

In a related update, Amazon SageMaker Hyperpod is getting access to a new checkpointless training feature to support more reliable model training experiences. Amazon SageMaker Hyperpod is a service that’s designed to automate the infrastructure requirements for AI model training, but in cases where hardware or software failures occur, it’s often very slow to recover, taking up to an hour in some cases.

With checkpointless training, Amazon SageMaker Hyperpod can now recover automatically from infrastructure faults in a matter of minutes, with zero intervention required by the customer, Amazon said. It works by continuously preserving the model’s state across the entire compute cluster as it’s being trained, so if any faults occur it can quickly fix them and pick up where it left off, without having to start over.

In addition, Amazon announced that it’s bringing the open-source AI agent framework Strands Agents to the Typescript programming language. TypeScript is an alternative to JavaScript that’s more resistant to errors and bugs. With this update, developers can use the Strands Agents framework to develop their entire agentic stack in the TypeScript language, Amazon said.

Image: SiliconANGLE/Dreamina

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.