News
Feb 19, 2026
News
Startups
Artificial Intelligence
Americas
NewDecoded
4 min read
Image by Inferact
Inferact, a new startup launched by the core maintainers of the vLLM project, has secured 150 million dollars in seed funding at an 800 million dollar valuation. This significant capital injection aims to scale the most popular open source inference engine into a universal standard for AI model serving. Lead investors include Andreessen Horowitz and Lightspeed Venture Partners, with participation from Sequoia Capital and Databricks Ventures.
At the heart of the company is vLLM, a high performance library designed to serve large language models with extreme efficiency. It is best known for introducing PagedAttention, an algorithm that manages memory similarly to an operating system to maximize throughput and minimize waste. Today, vLLM powers over 400,000 GPUs globally for major organizations such as Meta, Google, and Amazon.
The founding team features CEO Simon Mo and co-founder Woosuk Kwon, both PhD candidates from UC Berkeley who have steered vLLM since its first commit. They are joined by industry veterans Ion Stoica and Joseph Gonzalez, who previously helped launch successful ventures like Databricks and Anyscale. Their deep expertise in distributed systems gives Inferact a unique advantage in solving the growing bottlenecks of AI deployment at a global scale.
The startup enters the market as the industry shifts its focus from training models to the massive compute demands of running them. As architectures like Mixture of Experts and multimodal agents become more complex, the engineering required to serve them efficiently has skyrocketed. Inferact intends to bridge this gap by making model deployment as simple as spinning up a serverless database, absorbing the underlying infrastructure complexity.
While the team remains committed to keeping vLLM open and free, the funding will support a commercial enterprise layer. This platform will provide companies with the software infrastructure needed to run private or serverless inference clouds without being locked into proprietary ecosystems. They aim to serve as a neutral layer that works across all major hardware, including NVIDIA silicon, AMD accelerators, and custom cloud chips.
The company is currently hiring engineers and researchers to work at the intersection of models and hardware. By expanding vLLM’s performance and deepening support for emerging architectures, Inferact plans to ensure that the most capable models are accessible to everyone rather than just those with custom infrastructure.
The massive valuation for a seed stage company signals that the AI market has reached the "Inference Wall," where the cost of serving models is beginning to outweigh the cost of training them. By positioning itself as a neutral infrastructure provider, Inferact is attempting to become the Databricks of AI inference. If successful, they will decouple model intelligence from specific hardware, allowing enterprises to switch between different GPU and accelerator providers seamlessly. This move could break current proprietary lock-in and accelerate the transition toward ubiquitous agentic AI applications.