![](https://d15shllkswkct0.cloudfront.net/wp-content/blogs.dir/1/files/2025/01/Qwen2.5-max-banner.png)
![](https://d15shllkswkct0.cloudfront.net/wp-content/blogs.dir/1/files/2025/01/Qwen2.5-max-banner.png)
Alibaba Cloud, the cloud computing arm of China’s Alibaba Group Ltd., has released its latest breakthrough artificial intelligence large language model just in time for the Chinese New Year: Qwen 2.5-Max, which it claims surpasses today’s most powerful AI models.
The Tuesday release of Qwen 2.5-Max is the second big LLM release out of China in the past two weeks alongside DeepSeek’s R1 reasoning model. DeepSeek, a Chinese AI research startup, made waves with claims that R1 could rival the performance of the most capable models built by U.S. companies and trained at a small fraction of the cost.
“We developed Qwen 2.5-Max, a large-scale mixture of experts LLM model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning and Reinforcement Learning from Human Feedback methodologies,” the company said in a blog post.
Mixture of experts, or MoE, is an LLM architecture that uses multiple specialized models working in concert to handle complex tasks more efficiently according to a specific subset of expertise. It’s essentially as if a team of AI models, each trained to excel in a specific subcategory of knowledge, work together to combine their training to answer questions and complete tasks.
According to Alibaba, using this technique the new Qwen model exceeded the efficiency of DeepSeek-V3, the startup’s latest non-reasoning model released in late December, on key benchmarks, including ArenaHard, LiveBench and MMLU-Pro. The company also claimed it outperformed Anthropic PBC’s Claude 3.5 Sonnet, OpenAI’s GPT-4o and Meta Platform Inc.’s Llama 3.1-401B.
The architecture also allowed the company to build the model with a smaller footprint, only needing 20 trillion tokens to train. That allows it to use fewer resources when deployed and run at higher efficiency.
“The scaling of data and model size not only showcases advancements in model intelligence but also reflects our unwavering commitment to pioneering research,” the company said. “We are dedicated to enhancing the thinking and reasoning capabilities of large language models through the innovative application of scaled reinforcement learning.”
Unlike the other Qwen models, which have been released as open source, which allows developers to experiment and extend freely, Qwen 2.5-Max is still closed source. Alibaba has made the model available via an application programming interface through Alibaba Cloud, compatible with OpenAI’s API, making it easy for developers to integrate. Its also accessible through a ChatGPT-like chatbot interface on Qwen Chat.
Alibaba recently released Qwen2-VL in August, the company’s new vision-language model. The model is capable of advanced visual comprehension of video. It can ingest a high-quality video up to 20 minutes long and answer questions about its contents.
THANK YOU