AWS rolls out new Titan AI models for image and text generation
Amazon Web Services Inc. today announced two new artificial intelligence foundation models for Amazon’s Titan family, including an image generator and a model capable of taking text or images and turning them into AI readable vector embeddings.
The new models were announced today during the company’s annual re:Invent 2023 conference. Amazon also announced that text-generating models Titan Text Lite and Titan Text Express are now generally available on Amazon Bedrock, the company’s fully managed service for generative AI that provides access to multiple foundational models.
Amazon Titan models incorporate extremely large datasets and are designed to provide high-performing generative AI capabilities to customers, including text generation, image generation and vector search capabilities through fully managed application programming interfaces for the creation of AI applications. AWS pretrained these models to make them available for customers for numerous use cases that they can custom fine-tune for their purposes.
The newly announced Titan Image Generator, now in preview, allows customers to create and refine images using English language prompts for advertising, e-commerce, media and entertainment at scale. The foundation model is capable of generating realistic studio-quality images at scale and low cost. Users can also enhance images with prompts by telling the generator how they want the image modified.
For example, the model can be told to create a complex scene for an ad that has a backpack, including a room with other objects. Once the scene is generated the user can iterate on what is in the scene by continuing to converse with the model to change the color of the backpack, its contents, its material and more.
Users can edit parts of the image to remove or change portions of the image in areas that they define or replace entire backgrounds leaving the foreground subject untouched. The model itself can be fine-tuned with data from the company — such as logos, patented materials, trademarked colors and more.
“To build on our commitments we made at the White House earlier this year, all Titan generated images come with an invisible watermark by default,” said Swami Sivasubramanian, vice president of data and AI at AWS. “These watermarks are designed to help reduce the spread of misinformation by providing a discrete mechanism to identify AI-generated images.”
Sivasubramanian added that the watermarks are designed to be tamper-resistant. With this addition, Titan Image Generator joins Google LLC’s Imagen image-generating model and OpenAI DALL-E 3-powered Bing Image Creator in stamping “synthetic media” watermarks onto AI-generated images.
Using Titan Multimodal Embeddings, now generally available, companies can build more contextually relevant search and recommendation engines for users. Multimodal refers to the capability of processing more than one “modality,” or different types of data, such as text and images, using the Titan Multimodal Embedding foundation model it is possible to submit text, images or a combination of both as inputs. Traditionally, developers would have to use more than one model to do the same work — one model that would track the text input and one to track the image, this adds complexity and search latency, making the experience worse for end users.
The model is then capable of converting images into short English text strings up to 128 tokens long and turning them into vector database embeddings, or numerical representations, which makes it possible for AI to capture the semantic meaning of the image for search purposes. The model supports images up to 25 megabytes in size. Users can then fine-tune the image-caption pairs to fit expectations better.
“Now you can quickly generate quickly generate store and retrieve embeddings to build more accurate and contextually relevant multimodal search,” said Sivasubramanian.
For example, a company could use this model to provide a superior search for their products on a website such as if a customer uploaded an image of a dress and asked an AI assistant what top would go with it. A similar retailer could use the model to aid in providing searches for different types of shoes, hats and jackets by style, or any other item with distinct visual attributes. The same model could be used to combine text and images to describe company-specific manufacturing parts to help identify parts more effectively in searches.
New Amazon Titan Text models now in general availability
Titan Text Lite and Titan Text Express are large language models that are designed to provide a wide range of text-related generation tasks, including summarization, translation and conversational chatbot systems for question and answer and acting as agents. These same models can also act as coding assistants and generate code with support for programming languages such as JSON and CSV.
Titan Text Lite is a lightweight model designed for English language tasks with a maximum context length of 4,096 tokens and is optimized for lower cost. With this context-length that’s about 8,000 words or roughly four to five pages of a book. The model is designed to be highly customizable and aimed at tasks such as article summarization and copyrighting, making it useful for generating advertisement copy or preparing marketing emails.
Titan Text Express has double the context window of Lite at 8,192 tokens and has a broad range of capabilities, such as open-ended text generation, brainstorming, code generation, rewriting and conversational chat. It also provides support for retrieval augmented generation workflows, which allows a company to draw in data from other sources, such as secure proprietary data, to augment the model training with real-time information to increase its accuracy.
To facilitate that, Amazon also announced the general availability of Knowledge Bases for Bedrock, which allows companies to connect internal data sources to foundation models to deliver relevant, up-to-date, context-specific information to models such as Titan. Knowledge Bases extends the capabilities of AI models making them capable of responding to prompts with up-to-date, proprietary information about the business.
Image: AWS; photo: Robert Hof/SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.