News
Dec 24, 2025
Technical
Artificial Intelligence
Machine Learning
NewDecoded
4 min read
Image by LogRocket
Many organizations face a fundamental barrier to AI adoption: their data cannot leave the network. For companies operating under GDPR, handling confidential code, or managing sensitive operational data, sending prompts to Claude or GPT is simply not an option. This creates a gap between the promise of AI-powered automation and the reality of compliance requirements.
A new approach outlined in a LogRocket blog post proposes building agentic systems from multiple specialized small language models (SLMs), each handling distinct tasks based on measured capabilities. The design draws from ThinkSLM research presented at EMNLP 2025, which evaluated 72 small language models across 17 reasoning benchmarks. The findings revealed that models in the 1-3B parameter range, particularly from the Phi family, achieve strong multi-step reasoning relative to their size.
The architecture separates reasoning, retrieval, and expression into distinct components. Sub-1B models handle intent detection and safety filtering, where classification matters more than reasoning depth. Models in the 1-3B range manage planning and tool execution, while a local vector database stores private documents for retrieval-augmented generation. Crucially, cloud LLMs are only invoked optionally at the end, purely for stylistic output refinement after all sensitive context has been stripped away.
A key insight from the research is that test-time scaling techniques like multiple generations and majority voting can close the performance gap with much larger models. This makes smaller models viable for complex reasoning tasks when paired with proper orchestration. An Agent Manager coordinates these specialized models, tracks confidence scores, and applies inference-time techniques to improve reliability without sending data externally.
The proposed system addresses scenarios where internal documentation, incident logs, and source code must remain within corporate boundaries. Teams can query private knowledge bases, triage operational issues, and generate structured remediation steps entirely on-premise. Most requests never reach cloud APIs, reducing costs while maintaining compliance with data locality requirements.
The architecture works because it aligns model capabilities with actual task requirements. Classification doesn't need generative power. Retrieval over constrained documents doesn't benefit from massive parameter counts. And most business workflows need reliable structured outputs, not creative prose. By running inference on commodity GPUs with quantized models, organizations gain both privacy and cost efficiency.
This architectural pattern signals a broader shift in how enterprises will deploy AI going forward. Rather than waiting for vendors to solve privacy concerns or hoping regulations will loosen, organizations are discovering they can build capable systems with fundamentally different designs. The move toward specialized, locally-run models mirrors earlier enterprise software trends where monolithic solutions gave way to microservices. As more research like ThinkSLM provides empirical guidance on what small models can reliably handle, the "just use GPT-4" default becomes less automatic. For vendors selling hosted AI services, this represents a potential unbundling of capabilities they've positioned as inseparable. The real competition may not be between model providers, but between centralized and distributed architectural approaches.