Introduction
In the realm of artificial intelligence, large foundation models (LFMs) like ChatGPT and GPT-4 have garnered significant attention for their impressive performance across various tasks. These models exhibit remarkable capabilities, even achieving human-level performance on professional exams such as the SAT, GRE, and USMLE. Microsoft's recent research focuses on enhancing the capabilities of smaller models through imitation learning, drawing on the outputs generated by these LFMs. However, existing methods face challenges in replicating the reasoning and comprehension skills of the larger models. In response, Microsoft has developed the ORCA model, a 13-billion parameter model that aims to imitate the reasoning process of LFMs.
Addressing the Challenges
One of the major challenges faced in instruction tuning is the limited diversity and complexity of the training data. Existing methods often rely on simple instructions that lack the necessary depth for comprehensive reasoning. Microsoft's ORCA model overcomes this challenge by learning from rich signals provided by GPT-4, including explanation traces, step-by-step thought processes, and complex instructions. This progressive learning approach taps into large-scale and diverse imitation data, carefully sampled and selected to improve the model's reasoning capabilities.
Outperforming State-of-the-Art Models
The ORCA model demonstrates significant advancements over conventional instruction-tuned models like Vicuna-13B. In complex zero-shot reasoning benchmarks like BigBench Hard (BBH), ORCA surpasses Vicuna-13B by more than 100%. Moreover, ORCA achieves parity with ChatGPT on the BBH benchmark and performs competitively on professional and academic exams like the SAT, LSAT, GRE, and GMAT, without the need for CoT. Although trailing behind GPT-4, ORCA shows promise in narrowing the gap and improving model capabilities.
Learning from Step-by-Step Explanations
One of the key findings from Microsoft's research is that learning from step-by-step explanations, whether generated by humans or more advanced AI models, holds promise in enhancing model capabilities and skills. ORCA's augmentation of ⟨query, response⟩ pairs with detailed responses from GPT-4 helps the model gain a deeper understanding of the reasoning process. By leveraging system instructions that encourage explanations, ORCA mimics the LFMs' thought process, enabling more comprehensive and accurate responses.
Addressing Evaluation Challenges
Evaluation protocols for smaller LLMs have often been limited in their effectiveness. Previous studies relied on GPT-4 for auto-evaluation, which introduced biases and underestimated the reasoning and comprehension capabilities of smaller models. Microsoft's research focuses on addressing these challenges by employing meticulous evaluation methods. ORCA is assessed using various benchmarks, including auto-evaluation with GPT-4, academic benchmarks, professional and academic exams, and safety evaluation for toxic language generation and hate speech detection.
Conclusion
Microsoft's ORCA model represents a significant step forward in enhancing the reasoning and comprehension capabilities of smaller models through imitation learning. By learning from rich signals provided by LFMs like GPT-4, ORCA demonstrates superior performance on complex reasoning benchmarks and competitive results on professional and academic exams. The integration of step-by-step explanations into the learning process showcases the potential of advancing model capabilities and skills. Through their research, Microsoft highlights the importance of addressing the challenges faced in instruction tuning and lays the foundation for future advancements in AI reasoning.
Read the original research paper here.
Commenti