Writer announces Palmyra-Vision, a multimodal LLM capable of understanding images
Generative artificial intelligence startup Writer Inc. today announced the introduction of Palmyra-Vision, an AI large language model capable of text and visual understanding that can analyze images and produce text based on them.
Models that are capable of understanding more than one type of media are known as “multimodal” models, and Palmyra-Vision is Writer’s first multimodal AI model. According to the company, it is capable of extracting text from images, including handwritten text, classifying objects and colors, describing charts, graphs, infographics and flowcharts.
That means business users and everyday customers alike can put it to immediate use by handing the AI model images and asking it questions or have it produce results by asking it questions, as well as directing it to analyze information or produce content.
“Palmyra-Vision was built with customers in mind to solve actual use cases across industries from retail to insurance to pharmaceuticals and more,” May Habib, Writer’s chief executive and co-founder told SiliconANGLE. “Over the last several months, we listened to requests from our current customers and created a multimodal solution to offer generative AI capabilities while keeping a human in the loop and engaged throughout the process. ”
Writer already produces a family of text-generating models, under the name Palmyra. The company says the largest of its models are capable of outperforming competing models from OpenAI, Cohere Inc., Anthropic PBC, Google LLC and other companies on Stanford HELM’s benchmarking tests.
Habib said that Writer tested the new model against VQAv2, a dataset of open-ended questions on over 265,000 images that focuses on understanding language, common sense and contextual language. The Palmyra-Vision model scored 84.4%, which outperformed OpenAI’s GPT-4V multimodal model, which sits at 77.2%, and Gemini Ultra 1.0, which reached 77.8%.
The new model has a wide range of use cases across industries for retail, productivity, compliance, marketing, design and healthcare. Examples of its use include workers who might use its vision capability to help generate descriptions for hundreds or thousands of products for an e-commerce site to do quick drafts if high-quality information based on images. This can help shorten the time to market and help improve search engine optimization performance.
Many companies need to digitize records on the fly. Optical character recognition technology can struggle with handwritten text. For example, insurance companies and healthcare workers spend a great deal of time processing written reports and claims.
With the Palmyra-Vision model, these can be quickly ingested into the system and then the text is extracted and even “chatted” with even when the handwriting quality is low, the company said. This could be quite the benefit, especially given how doctors aren’t known for the best penmanship.
For users who work regularly with charts and graphs, the new model provides a second-sight capability that allows users to interpret those images and help summarize and provide insights and key takeaways for users who otherwise wouldn’t have the immediate know-how. For example, financial advisers could use the model to generate summaries quickly for client portfolios and performance based on existing documentation even if the underlying data wasn’t available.
The new Vision model follows Writer’s most recent upgrade in January to support multilingual capabilities, which added 30 languages, including Spanish, French, Chinese, Hindi, Arabic and Russian alongside English.
“If a customer wants an accurate translation, they can translate the outputs from Palmyra-Vision in partnership with Palmyra-X, the model that has the multilingual capabilities,” Habib said. “It would be two steps — extract text with Vision and then translate (for high accuracy) with Palmyra-X.”
Image: Pixabay
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU