Elon Musk’s xAI releases Grok-1 architecture, while Apple advances multimodal AI research
The Elon Musk-run artificial intelligence startup xAI Corp. today released the weights and architecture of its Grok-1 large language model as open source code, shortly after Apple Inc. published a paper describing its own work on multimode LLMs.
Musk first said that xAI would release Grok as open source on March 11, but the release today of the base model and weights, fundamental components of how the model works makes this the company’s first open-source release.
What has been released is part of the network architecture of Grok’s structural design, including how layers and nodes are arranged and interconnected to process data. Base model weights are the parameters within a given model’s architecture that have been adjusted during training, encoding the learned information and determining how input data is transformed into output.
Grok-1 is a 314 billion parameter “Mixture-of-Experts” model trained from scratch by xAI. A Mixture-of-Experts model is a machine learning approach that combines the outputs of multiple specialized sub-models, also known as experts, to make a final prediction, optimizing for diverse tasks or data subsets by leveraging the expertise of each individual model.
The release is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. According to the company, “this means that the model is not fine-tuned for any specific application, such as dialogue.” No further information was provided in what was only a brief blog post.
Musk revealed in July that he had founded xAI and that it will compete against AI services from companies such as Google LLC and OpenAI. The company’s first model, Grok, was claimed by xAI to have been modeled after Douglas Adams’ classic book “The Hitchhiker’s Guide to the Galaxy” and is “intended to answer almost anything and, far harder, even suggest what questions to ask!”
Meanwhile, at Apple, the company Steve Jobs built quietly published a paper Thursday describing its work on MM1, a set of multimodal LLMs for captioning images, answering visual questions, and natural language inference.
Thurott reported today that the paper describes MM1 as a family of multimodal models that support up to 30 billion parameters and “achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks.” The researchers also claim that multimodal large language models have emerged as “the next frontier in foundation models” after traditional LLMs and they “achieve superior capabilities.”
A multimodal LLM is an AI system capable of understanding and generating responses across multiple types of data, such as text, images and audio, integrating diverse forms of information to perform complex tasks. The Apple researchers believe that their model delivers a breakthrough that will help others scale these models into larger sets of data with better performance and reliability.
Apple’s previous work on multimodal LLMs includes Ferret, a model that was quietly open-sourced in October before being noticed in December.
Image: DALL-E 3
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU