News
Feb 24, 2026
News
Enterprise
Artificial Intelligence
Americas
NewDecoded
3 min read

Image by OpenAI
OpenAI and crypto investment firm Paradigm have officially launched EVMbench, a specialized benchmark for evaluating AI performance in smart contract security. The framework tests whether AI agents can effectively identify, repair, or exploit high-severity vulnerabilities within Ethereum-based environments. This release comes at a time when smart contracts secure more than 100 billion dollars in digital assets worldwide.
The tool represents a significant shift from general-purpose coding tests toward specialized safety evaluations. By focusing on the Ethereum Virtual Machine, researchers can observe how models handle immutable code and direct value transfers. This domain provides a rigorous testing ground for the dangerous capabilities of frontier AI models where a single error can be catastrophic.
Performance results indicate that AI offensive skills are accelerating at an unprecedented pace. The newly referenced GPT-5.3-Codex reached a 72.2 percent success rate in exploit tasks, which is a dramatic increase from the 31.9 percent score of GPT-5 released just six months ago. These gains highlight the urgent need for robust defensive tools to counter emerging cyber risks.
EVMbench operates through three distinct modes: detect, patch, and exploit. Success is measured using a Rust-based harness that runs in an isolated sandbox to ensure safety and reproducibility. While agents are increasingly proficient at draining funds, they still face challenges when attempting to fix vulnerabilities without altering the original purpose of the code.
To help researchers stay ahead of potential threats, OpenAI is expanding its Cybersecurity Grant Program with 10 million dollars in API credits. This funding supports organizations that are protecting open-source software and critical infrastructure. The company is also testing Aardvark, a dedicated security research agent designed specifically for defenders.
The benchmark incorporates real-world data from the Tempo blockchain, a project designed for stablecoin payments. Including these scenarios ensures that the benchmark remains grounded in practical, high-throughput financial environments. Developers can access the full paper and tooling via the official OpenAI research portal.
The launch of EVMbench marks a pivotal transition where AI capability is no longer measured by general logic but by its impact on high-stakes, immutable financial code. By moving past standard evaluations like SWE-bench, OpenAI is acknowledging that the dual-use nature of frontier models requires domain-specific guardrails. The rapid jump in exploit proficiency highlights a narrowing window for human-only auditing, pushing the industry toward an AI-augmented defensive standard. This strategy, combined with a 10 million dollar grant and the Tempo blockchain collaboration, suggests that the future of blockchain security will be defined by an arms race between AI-driven attackers and the defensive safety harnesses being built today.
Related Articles