News
MIS Signs Massive 1.88 Billion SAR AI Data Center Deal With HUMAIN
News
Enterprise
Artificial Intelligence
Americas
NewDecoded
4 min read

Image by Microsoft
Microsoft has unveiled a significant upgrade to Researcher, the deep research agent within Microsoft 365 Copilot, introducing multi-model capabilities named Critique and Council. These updates leverage a combination of Frontier models from OpenAI and Anthropic to improve report accuracy and analytical depth. The system is designed to tackle complex workflows in the flow of work by moving away from traditional single-model AI research processes.
The new Critique architecture separates the drafting process from the review phase by using two distinct AI models. One model plans and drafts the research, while a second acts as an expert reviewer to validate claims and enforce strict grounding standards. This reviewer utilizes a rubric to assess source reliability and report completeness, ensuring that every key claim is precisely anchored to authoritative citations before a final report is delivered to the user.
For users requiring comparative insights, the Council feature runs OpenAI and Anthropic models simultaneously to produce side-by-side reports. Once both documents are generated, a dedicated judge model evaluates the outputs and creates a cover letter. This summary distills key findings and highlights exactly where the different models agree or diverge in their interpretations, allowing for more confident decision-making.
Performance testing on the DRACO benchmark reveals that this multi-model approach delivers a substantial quality boost. The system achieved a 7.0 point improvement over single-model methods, outperforming competitors like Perplexity Deep Research by more than 13 percent. Evaluations showed the most significant gains in analytical breadth and presentation quality, suggesting that the reviewer model effectively identifies missing angles and coverage gaps.
The research tasks used for validation spanned 10 domains including medicine, technology, and law. Results showed that Critique is particularly adept at challenging weak claims and enforcing high-precision grounding. This collaborative model interaction helps solve the common problem of AI hallucinations by creating a rigorous internal feedback loop that mimics professional academic peer review.
These features are currently available to members of the Frontier program as part of Microsoft's push toward agentic workflows. Organizations in the EU may need to review their subprocessor agreements as Anthropic becomes a central part of the Copilot ecosystem alongside OpenAI. While the multi-model execution increases computational load, the resulting accuracy provides a new benchmark for deep research in the workplace.
This move marks a definitive shift in the AI industry from simple pipeline execution to structured deliberation. By forcing models to argue or review each other, Microsoft is addressing the reliability gap that has historically hampered enterprise AI adoption. While competitors have focused on speed and retrieval, this architecture prioritizes the system of thought, where the value lies in the interaction between different model families rather than the raw power of one. As AI agents move toward persistent memory and long-term reasoning, these multi-model frameworks will likely become the standard for any task requiring high-stakes professional research.
Related Articles