News

Enterprise

Artificial Intelligence

Americas

OpenAI and Paradigm Launch EVMbench to Secure Smart Contracts with Advanced AI Agents

OpenAI has introduced EVMbench, a rigorous new framework designed to measure and improve the ability of AI agents to secure the global blockchain ecosystem.

NewDecoded

Published Feb 24, 2026

Feb 24, 2026

3 min read

Image by OpenAI

OpenAI and crypto investment firm Paradigm have officially launched EVMbench, a specialized benchmark for evaluating AI performance in smart contract security. The framework tests whether AI agents can effectively identify, repair, or exploit high-severity vulnerabilities within Ethereum-based environments. This release comes at a time when smart contracts secure more than 100 billion dollars in digital assets worldwide.

The tool represents a significant shift from general-purpose coding tests toward specialized safety evaluations. By focusing on the Ethereum Virtual Machine, researchers can observe how models handle immutable code and direct value transfers. This domain provides a rigorous testing ground for the dangerous capabilities of frontier AI models where a single error can be catastrophic.

Performance results indicate that AI offensive skills are accelerating at an unprecedented pace. The newly referenced GPT-5.3-Codex reached a 72.2 percent success rate in exploit tasks, which is a dramatic increase from the 31.9 percent score of GPT-5 released just six months ago. These gains highlight the urgent need for robust defensive tools to counter emerging cyber risks.

EVMbench operates through three distinct modes: detect, patch, and exploit. Success is measured using a Rust-based harness that runs in an isolated sandbox to ensure safety and reproducibility. While agents are increasingly proficient at draining funds, they still face challenges when attempting to fix vulnerabilities without altering the original purpose of the code.

To help researchers stay ahead of potential threats, OpenAI is expanding its Cybersecurity Grant Program with 10 million dollars in API credits. This funding supports organizations that are protecting open-source software and critical infrastructure. The company is also testing Aardvark, a dedicated security research agent designed specifically for defenders.

The benchmark incorporates real-world data from the Tempo blockchain, a project designed for stablecoin payments. Including these scenarios ensures that the benchmark remains grounded in practical, high-throughput financial environments. Developers can access the full paper and tooling via the official OpenAI research portal.

Decoded Take

The launch of EVMbench marks a pivotal transition where AI capability is no longer measured by general logic but by its impact on high-stakes, immutable financial code. By moving past standard evaluations like SWE-bench, OpenAI is acknowledging that the dual-use nature of frontier models requires domain-specific guardrails. The rapid jump in exploit proficiency highlights a narrowing window for human-only auditing, pushing the industry toward an AI-augmented defensive standard. This strategy, combined with a 10 million dollar grant and the Tempo blockchain collaboration, suggests that the future of blockchain security will be defined by an arms race between AI-driven attackers and the defensive safety harnesses being built today.

Want to advertise your Data, Analytics, or AI here? Reach out!

NewDecoded

Want to advertise your Data, Analytics, or AI here? Reach out!

NewDecoded

Want to advertise your Data, Analytics, or AI here? Reach out!

NewDecoded

Share this article

News

Apr 22, 2026

NEURA Robotics and AWS Unite to Scale Physical AI Across Global Logistics Networks

News

Apr 22, 2026

NEURA Robotics and AWS Unite to Scale Physical AI Across Global Logistics Networks

News

Apr 22, 2026

NEURA Robotics and AWS Unite to Scale Physical AI Across Global Logistics Networks

News

Apr 22, 2026

VAST Data Hits $30 Billion Valuation as AI Infrastructure Becomes a Global Priority

News

Apr 22, 2026

VAST Data Hits $30 Billion Valuation as AI Infrastructure Becomes a Global Priority

News

Apr 22, 2026

VAST Data Hits $30 Billion Valuation as AI Infrastructure Becomes a Global Priority

News

Apr 23, 2026

Google Unveils Workspace Intelligence to Automate Multi-Step Workflows via Agentic AI

News

Apr 23, 2026

Google Unveils Workspace Intelligence to Automate Multi-Step Workflows via Agentic AI

News

Apr 23, 2026

Google Unveils Workspace Intelligence to Automate Multi-Step Workflows via Agentic AI

News

Apr 23, 2026

Salesforce and Google Cloud Unite AI Agents for Cross-Platform Enterprise Workflows

News

Apr 23, 2026

Salesforce and Google Cloud Unite AI Agents for Cross-Platform Enterprise Workflows

News

Apr 23, 2026

Salesforce and Google Cloud Unite AI Agents for Cross-Platform Enterprise Workflows

News

Apr 23, 2026

Google Cloud Next 2026 Debuts Specialized AI Chips and Gemini Agent Platform

News

Apr 23, 2026

Google Cloud Next 2026 Debuts Specialized AI Chips and Gemini Agent Platform

News

Apr 23, 2026

Google Cloud Next 2026 Debuts Specialized AI Chips and Gemini Agent Platform

News

Apr 23, 2026

PwC Singapore Launches S$4 Million AI-Powered Hub to Navigate Global Trade Complexity

News

Apr 23, 2026

PwC Singapore Launches S$4 Million AI-Powered Hub to Navigate Global Trade Complexity

News

Apr 23, 2026

PwC Singapore Launches S$4 Million AI-Powered Hub to Navigate Global Trade Complexity