News

Enterprise

Artificial Intelligence

Americas

Microsoft 365 Copilot Researcher Gains Multi-Model Intelligence via New Critique and Council Features

Microsoft is evolving its deep research agent by integrating models from OpenAI and Anthropic to perform collaborative analysis and rigorous self-evaluation.

Microsoft is evolving its deep research agent by integrating models from OpenAI and Anthropic to perform collaborative analysis and rigorous self-evaluation.

NewDecoded

Published Mar 30, 2026

Mar 30, 2026

4 min read

Image by Microsoft

Microsoft has unveiled a significant upgrade to Researcher, the deep research agent within Microsoft 365 Copilot, introducing multi-model capabilities named Critique and Council. These updates leverage a combination of Frontier models from OpenAI and Anthropic to improve report accuracy and analytical depth. The system is designed to tackle complex workflows in the flow of work by moving away from traditional single-model AI research processes.

The new Critique architecture separates the drafting process from the review phase by using two distinct AI models. One model plans and drafts the research, while a second acts as an expert reviewer to validate claims and enforce strict grounding standards. This reviewer utilizes a rubric to assess source reliability and report completeness, ensuring that every key claim is precisely anchored to authoritative citations before a final report is delivered to the user.

For users requiring comparative insights, the Council feature runs OpenAI and Anthropic models simultaneously to produce side-by-side reports. Once both documents are generated, a dedicated judge model evaluates the outputs and creates a cover letter. This summary distills key findings and highlights exactly where the different models agree or diverge in their interpretations, allowing for more confident decision-making.

Performance testing on the DRACO benchmark reveals that this multi-model approach delivers a substantial quality boost. The system achieved a 7.0 point improvement over single-model methods, outperforming competitors like Perplexity Deep Research by more than 13 percent. Evaluations showed the most significant gains in analytical breadth and presentation quality, suggesting that the reviewer model effectively identifies missing angles and coverage gaps.

The research tasks used for validation spanned 10 domains including medicine, technology, and law. Results showed that Critique is particularly adept at challenging weak claims and enforcing high-precision grounding. This collaborative model interaction helps solve the common problem of AI hallucinations by creating a rigorous internal feedback loop that mimics professional academic peer review.

These features are currently available to members of the Frontier program as part of Microsoft's push toward agentic workflows. Organizations in the EU may need to review their subprocessor agreements as Anthropic becomes a central part of the Copilot ecosystem alongside OpenAI. While the multi-model execution increases computational load, the resulting accuracy provides a new benchmark for deep research in the workplace.


Decoded Take

Decoded Take

Decoded Take

This move marks a definitive shift in the AI industry from simple pipeline execution to structured deliberation. By forcing models to argue or review each other, Microsoft is addressing the reliability gap that has historically hampered enterprise AI adoption. While competitors have focused on speed and retrieval, this architecture prioritizes the system of thought, where the value lies in the interaction between different model families rather than the raw power of one. As AI agents move toward persistent memory and long-term reasoning, these multi-model frameworks will likely become the standard for any task requiring high-stakes professional research.

Share this article

Related Articles