News

Scaleups

Open-Source

Data

Americas

dbt Labs Unveils ADE-Bench: First Open Benchmark for AI Analytics Engineering

dbt Labs introduces ADE-Bench, the first comprehensive open-source benchmark for evaluating AI agents on real-world analytics and data engineering tasks, featuring complete dbt project environments and database testing.

dbt Labs introduces ADE-Bench, the first comprehensive open-source benchmark for evaluating AI agents on real-world analytics and data engineering tasks, featuring complete dbt project environments and database testing.

dbt Labs introduces ADE-Bench, the first comprehensive open-source benchmark for evaluating AI agents on real-world analytics and data engineering tasks, featuring complete dbt project environments and database testing.

NewDecoded

Published Dec 2, 2025

Dec 2, 2025

3 min read

dbt Labs has launched ADE-Bench, an open-source framework designed to evaluate AI agents on analytics and data engineering tasks in realistic dbt project environments. The benchmark, created by Benn Stancil (founder of Mode), will be presented at a live virtual webinar on December 9, 2025.

ADE-Bench tests AI agents on complete data projects rather than isolated coding tasks. The framework evaluates agents across multiple dimensions, including their ability to work with dbt projects, interact with databases like DuckDB and Snowflake, and solve problems that mirror real analytics workflows. Each task runs in an isolated Docker container, creating sandboxed environments that can be corrupted, modified, and tested independently.

The benchmark currently supports major AI agents including Claude Code, OpenAI Codex, Google Gemini, and Macro. It includes automatic test generation, solution validation, and detailed performance tracking. The framework evaluates tasks through dbt tests, comparing agent output against expected results with support for approximate equality testing to handle database-specific quirks.

With 32 GitHub stars and active development since its October 2025 launch, ADE-Bench builds on prior work from Terminal-Bench and Spider 2.0. The framework features sophisticated migration capabilities for cross-database testing and supports the Model Context Protocol (MCP) for enhanced agent interactions. The December 9 webinar will feature Stancil alongside Jason Ganz, Senior Manager of AI Strategy at dbt Labs, demonstrating the benchmark workflow and sharing early findings.

The framework addresses a critical gap in AI evaluation. While 70% of analytics professionals already use AI for code development according to dbt Labs' 2025 State of Analytics Engineering Report, tools to measure AI performance on complete data engineering workflows have been limited. ADE-Bench provides teams with a credible, rerunnable benchmark they can extend and customize for their specific use cases.


Decoded Take

Decoded Take

Decoded Take

ADE-Bench represents a strategic shift in how the data industry evaluates AI capabilities. While other benchmarks like DA-bench focus on isolated query tasks, ADE-Bench tests agents on complete project environments with dependencies, configurations, and real-world complexity.

This timing aligns with dbt Labs' broader AI push, including their recently announced dbt Agents and Fusion engine, suggesting the company is positioning itself at the intersection of analytics engineering and AI tooling. The benchmark's open-source nature and support for multiple agents also signals an industry-wide effort to standardize AI evaluation rather than proprietary scoring systems.

For data teams already integrating AI into workflows, ADE-Bench provides a much-needed framework to quantify improvements and justify investments, potentially accelerating AI adoption across the analytics engineering profession.

Share this article

Related Articles

Related Articles

Related Articles