Published on 1/3/2025 | 4 min read
Microsoft’s research suggests that OpenAI’s o1 Mini and GPT-4o Mini models consist of approximately 100 billion and 8 billion parameters, respectively. These numbers provide a rare glimpse into the architectures of models that are usually shrouded in secrecy. Additionally, Microsoft estimated that Claude 3.5 Sonnet features around 175 billion parameters, while the o1 Preview boasts a massive 300 billion parameters.
The announcement has sparked excitement in the AI community, particularly regarding the GPT-4o Mini. Despite its compact size, this model reportedly surpasses larger counterparts such as GPT-4o and Claude 3.5 Haiku in performance and is comparable to Meta’s Llama 3.3 70B, according to a quality index from Artificial Analysis.
One of the most intriguing aspects of the GPT-4o Mini is its potential for use on portable devices. With only 8 billion parameters, this model could operate locally without requiring extensive computational resources. This efficiency has led some experts, including Yuchen Jin, CTO of Hyperbolic Labs, to call on OpenAI to consider open-sourcing the GPT-4o Mini, arguing that it could revolutionize local AI applications.
However, there’s speculation that GPT-4o Mini’s architecture incorporates a “mixture of experts” (MoE) design. This approach activates only a subset of its total parameters for specific tasks, effectively combining efficiency with specialized problem-solving capabilities.
Oscar Le, CEO of SnapEdit, noted the model’s exceptional performance in factual knowledge retrieval, which he attributed to its MoE-based structure. “It’s likely that 4o Mini operates with around 40 billion total parameters but utilizes only 8 billion at any given time,” he said, emphasizing its speed and accuracy.
Microsoft’s research included these models as part of a broader effort to develop benchmarks for detecting and correcting medical errors in clinical notes. However, the company clarified that their parameter counts are estimates intended to contextualize model performance rather than precise figures.
Historically, OpenAI, Anthropic, and Google have withheld detailed technical reports about their latest models, citing proprietary concerns. The last technical report from OpenAI was for GPT-4, released in 2023. By contrast, Microsoft and other companies, such as Alibaba and DeepSeek, have embraced transparency by publishing comprehensive technical documentation for their AI models.
Over recent years, the trend in AI has shifted from building ever-larger models to optimizing efficiency and performance. While earlier advancements in AI involved exponential increases in parameter counts, recent innovations focus on scaling down while maintaining or improving model capabilities.
Between GPT-1 and GPT-3, parameter counts grew by a factor of 1,000, and between GPT-3 and GPT-4, they increased tenfold. However, current frontier models, such as GPT-4o and Claude 3.5 Sonnet, are significantly smaller than GPT-4. For instance, EpochAI’s estimate for GPT-4o’s size is 200 billion parameters, compared to GPT-4’s reported 1.8 trillion.
Increasing parameter size no longer guarantees proportional improvements in performance. Factors like computational limitations and the scarcity of new datasets contribute to diminishing returns. Yann LeCun, a prominent AI researcher, highlighted this trend, noting that “a model with more parameters is not necessarily better” due to higher costs and resource requirements.
To address these challenges, researchers are employing innovative techniques at the architectural level. The MoE design, exemplified by GPT-4o and GPT-4o Mini, activates only the necessary modules for specific tasks, reducing computational overhead without compromising performance.
As 2024 concluded, AI researchers unveiled models utilizing smaller, curated datasets and advanced training techniques to outperform larger counterparts. Microsoft’s Phi-4 model, released in December, exemplifies this shift. By focusing on high-quality data and efficient design, the Phi-4 model outperformed industry leaders like GPT-4o.
In a remarkable development, DeepSeek released an open-source MoE model, V3, which not only surpassed GPT-4o in several benchmarks but was trained at a fraction of the cost. Training V3 required just $5.6 million compared to the $40 million for GPT-4.
The trend toward efficiency and innovation in AI is expected to continue in 2025. Researchers are increasingly turning to neurosymbolic approaches, test-time training, and symbolic tool use to address complex problems. As François Chollet, the creator of Keras, stated, “Bigger models are not all you need. You need better ideas.”
The coming year promises breakthroughs that prioritize smarter, more efficient designs over sheer size, paving the way for more accessible and cost-effective AI solutions.