UPDATED 09:00 EST / FEBRUARY 27 2024

Writer announces Palmyra-Vision, a multimodal LLM capable of understanding images

Generative artificial intelligence startup Writer Inc. today announced the introduction of Palmyra-Vision, an AI large language model capable of text and visual understanding that can analyze images and produce text based on them.

Models that are capable of understanding more than one type of media are known as “multimodal” models, and Palmyra-Vision is Writer’s first multimodal AI model. According to the company, it is capable of extracting text from images, including handwritten text, classifying objects and colors, describing charts, graphs, infographics and flowcharts.

That means business users and everyday customers alike can put it to immediate use by handing the AI model images and asking it questions or have it produce results by asking it questions, as well as directing it to analyze information or produce content.

“Palmyra-Vision was built with customers in mind to solve actual use cases across industries from retail to insurance to pharmaceuticals and more,” May Habib, Writer’s chief executive and co-founder told SiliconANGLE. “Over the last several months, we listened to requests from our current customers and created a multimodal solution to offer generative AI capabilities while keeping a human in the loop and engaged throughout the process. ”

Writer already produces a family of text-generating models, under the name Palmyra. The company says the largest of its models are capable of outperforming competing models from OpenAI, Cohere Inc., Anthropic PBC, Google LLC and other companies on Stanford HELM’s benchmarking tests.

Habib said that Writer tested the new model against VQAv2, a dataset of open-ended questions on over 265,000 images that focuses on understanding language, common sense and contextual language. The Palmyra-Vision model scored 84.4%, which outperformed OpenAI’s GPT-4V multimodal model, which sits at 77.2%, and Gemini Ultra 1.0, which reached 77.8%.

The new model has a wide range of use cases across industries for retail, productivity, compliance, marketing, design and healthcare. Examples of its use include workers who might use its vision capability to help generate descriptions for hundreds or thousands of products for an e-commerce site to do quick drafts if high-quality information based on images. This can help shorten the time to market and help improve search engine optimization performance.

Many companies need to digitize records on the fly. Optical character recognition technology can struggle with handwritten text. For example, insurance companies and healthcare workers spend a great deal of time processing written reports and claims.

With the Palmyra-Vision model, these can be quickly ingested into the system and then the text is extracted and even “chatted” with even when the handwriting quality is low, the company said. This could be quite the benefit, especially given how doctors aren’t known for the best penmanship.

For users who work regularly with charts and graphs, the new model provides a second-sight capability that allows users to interpret those images and help summarize and provide insights and key takeaways for users who otherwise wouldn’t have the immediate know-how. For example, financial advisers could use the model to generate summaries quickly for client portfolios and performance based on existing documentation even if the underlying data wasn’t available.

The new Vision model follows Writer’s most recent upgrade in January to support multilingual capabilities, which added 30 languages, including Spanish, French, Chinese, Hindi, Arabic and Russian alongside English.

“If a customer wants an accurate translation, they can translate the outputs from Palmyra-Vision in partnership with Palmyra-X, the model that has the multilingual capabilities,” Habib said. “It would be two steps — extract text with Vision and then translate (for high accuracy) with Palmyra-X.”

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Writer announces Palmyra-Vision, a multimodal LLM capable of understanding images

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

Writer announces Palmyra-Vision, a multimodal LLM capable of understanding images

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

Cookies