UPDATED 20:16 EST / APRIL 14 2024

AI

Elon Musk-backed xAI debuts its first multimodal model, Grok-1.5V

The Elon Musk-led artificial intelligence startup xAI Corp. unveiled its first multimodal model late Friday, adding to an AI arms race that never seems to end.

It’s called Grok-1.5 Vision or Grok-1.5V, and it goes much further than the original Grok-1 large language model, since it can understand text and also visuals, including things such as documents it’s shown, photographs, screenshots, charts, diagrams and so on.

According to the company, Grok-1.5V is more than able to compete with existing multimodal models in various domains, specializing in what it calls “multidisciplinary reasoning.” It incorporates smart spatiotemporal perception capabilities, otherwise known in the AI industry as “real-world spatial understanding,” giving it the ability to reason with complex text, interpret scientific pictures and interact with visual content in a humanlike way.

The company offered various examples of how Grok-1.5V might be used in the real world. It can, for instance, be used to translate drawings into children’s stories, identify which object in a group is the largest, assist drivers by checking that there’s enough space to maneuver around an obstacle, convert a table into the CSV file format, or identify if a wooden deck is rotting and needs to be replaced. It will even explain the context of internet memes that the user doesn’t understand.

XAI offered up some benchmark results, saying Grok-1.5V surpasses the performance of industry peers such as GPT-4V, Claude, 3Sonnet, Claude 3 Opus and Gemini Pro 1.5. The company noted that Grok-1.5V significantly outperformed its competitors in a new benchmark called RealWorldQA benchmark, which the company created specifically to measure real-world spatial understanding.

Holger Mueller of Constellation Research Inc. said multimodal AI is becoming a big battleground for AI companies, with xAI timing the launch of Grok-1.5V to try and steal the limelight from Google LLC, which only just revealed its Gemini 1.5 model last week. “Multimodal AI models matter, because the world itself is multimodal, and the industry wants to remove humans from the equation as the glue that links all of the different kinds of functional AI models,” he said. “Multimodal models like Grok-1.5V simplify AI, and will play a key role in driving AI-powered enterprise acceleration. They’re set to become the new standard for generative AI.”

The multimodal version of Grok comes less than a month after Musk’s company unveiled the standard Grok-1.5 LLM, which delivered superior coding and math capabilities to its predecessor, Grok-1. Grok-1.5 also showed it can process much longer contexts than the original, meaning it can check data from more sources to improve the accuracy of its responses.

According to xAI, Grok-1.5V will soon be made available to early testers, starting with subscribers to X’s Premium+ service, which provides users of the social media site formerly known as Twitter with additional benefits.

The startup has come a long way very fast since it launched in July 2023. At the time, Musk said he was launching the company in response to the “black box” approach of AI developers such as OpenAI and Google, which are very secretive about how their AI models work. Musk said the goal is to create AI that’s more transparent and accountable than its competitors’ work.

Image: xAI

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU