UPDATED 18:52 EDT / DECEMBER 25 2023

AI

Apple quietly launched an open-source multimodal LLM called Ferret

Artificial intelligence researchers from Apple Inc. and Cornell University quietly unveiled an open-source and multimodal large language model last October known as Ferret, which is said to use parts of images as queries.

According to VentureBeat, the release of Ferret on GitHub in October went completely under the radar, with no announcement being made. However, it has since gotten a lot of attention from AI researchers. Bart De Witte, who operates a non-profit focused on open-source AI in medicine, posted on X that the release of Ferret “solidifies Apple’s place as a leader in the multimodal AI space.”

The way Ferret works is that it examines a specific region of an image, determines the elements within it that could be of use in response to a query, identifies those elements, and draws a bounding box around them. Then, it can use the identified elements as part of a query, which it will respond to in a traditional manner.

For instance, if a user highlights an image of an animal within a larger image, then asks the LLM what the animal is, it will respond to that query by identifying what species the creature is. It can then use the context of other elements it detects within the image to provide further responses or provide context on what the animal is doing.

The open-source Ferret model is a system that can “refer and ground anything anywhere at any granularity”, said Apple AI research scientist Zhe Gan in an earlier post on X:

AI researchers claim the release of Ferret is important as it demonstrates a surprising openness from Apple, which is in direct contrast to the company’s usual secretive nature.

The open-source approach may suit Apple in the AI industry, however, as the company is struggling to compete with rivals such as Microsoft Corp. and Google LLC due to a lack of computing resources. According to tech blogger Ben Dickson, Apple’s infrastructure is not designed to serve up LLMs at scale, which means the company cannot expect to compete with models such as ChatGPT. Apple therefore has to choose between partnering with a cloud hyperscale on its AI efforts, or share its work with the open-source community, similar to the approach taken by Meta Platforms Inc.

Photo: Pexels/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.