In a significant shift for AI-assisted software development, Windsurf has unveiled its Wave 9 update featuring SWE-1 (Software Engineer 1), the company's first family of purpose-built AI models optimized specifically for comprehensive software engineering tasks rather than just code generation. Released on May 15, 2025, this update marks Windsurf's evolution from integrating third-party large language models (LLMs) to developing its own specialized "frontier" models addressing the full spectrum of software engineering challenges.
The SWE-1 Family: Purpose-Built for Engineering Tasks
The new model family consists of three specialized variants, each targeting different user needs and deployment scenarios:
- SWE-1: The flagship model, comparable to Claude 3.5 Sonnet in tool-call reasoning capabilities while being more cost-effective to deploy. Currently available to all paid users at no additional credit cost as part of a promotional period.
- SWE-1-lite: A medium-sized model that replaces the previous Cascade Base while delivering higher quality results. Available for unlimited use to all Windsurf users, including those on free plans.
- SWE-1-mini: An ultra-fast, lightweight model powering Windsurf Tab's passive predictive features, designed for sub-second latency to support inline code completions without disrupting developer workflows. Like SWE-1-lite, it's available to all users without restrictions.
Beyond Code Generation: Engineering-Native AI
While most existing AI coding assistants adapt general-purpose LLMs for programming tasks, Windsurf's approach recognizes that actual software engineering encompasses far more than writing syntactically correct code. Real-world development involves navigating terminal commands, managing testing workflows, interpreting ambiguous user requirements, and handling long-running projects with incomplete states.
"Simply put, our goal is to accelerate software development by 99%. Writing code is only a fraction of what you do. A 'coding-capable' model won't cut it," Windsurf states in its announcement.
Traditional LLMs excel at isolated, tactical tasks—generating a function or passing a unit test—but struggle with the strategic aspects of development that require maintaining context across multiple states and interfaces. SWE-1 aims to bridge this gap by supporting the entire engineering process.
Flow Awareness: The Core Innovation
The centerpiece of SWE-1's architecture is what Windsurf calls "flow awareness"—a novel data model based on a unified "shared timeline" of actions across multiple development interfaces. This approach allows the models to track and understand the interplay between:
- Code edits in the text editor
- Terminal commands and error outputs
- Browser previews showing rendered UIs or frontend errors
- User clipboard contents
- In-IDE search queries
- Current Cascade conversations
By consolidating these signals, SWE-1 can reason over partial states, understanding, for example, that a test failure resulted from a recent code change, and then propose targeted solutions. This continuous feedback loop not only improves suggestion quality but also informs future model training, creating what Windsurf describes as a "flywheel" of capability enhancement.
"We built the Windsurf Editor in order to build a seamless intertwining between the comprehensive states of user and the AI," the company explains. "Anything that the AI does, the human should be able to observe and action on, and anything the human does, the AI should be able to observe and action on."
Performance Benchmarks: How SWE-1 Measures Up
According to Windsurf's evaluations, the SWE-1 family performs competitively against industry-leading models. The company conducted both offline evaluations and blind production experiments to measure effectiveness.
Offline Evaluation Results
SWE-1 was measured against Anthropic's Claude models and open-weight alternatives like Deepseek V3 and Qwen on two benchmark tests:
- Conversational SWE Task Benchmark: Starting mid-conversation with a half-finished task, how well does the model address the next user query? Scoring factors included helpfulness, efficiency, correctness, and accuracy of file edits.
- End-to-End SWE Task Benchmark: Beginning from scratch, how well does the model solve a problem by passing selected unit tests? This measures the model's ability to operate independently.
On both benchmarks, Windsurf reports that SWE-1 approaches the performance level of frontier models like Claude 3.5 Sonnet while outperforming all mid-sized and open-weight alternatives.
Production Experiments
Using blind A/B testing with real users, Windsurf measured two key metrics:
- Daily Lines Contributed per User: The average number of lines written by Cascade that were actively accepted and retained by users over a fixed time period—a metric reflecting both model helpfulness and user trust.
- Cascade Contribution Rate: For files edited at least once by Cascade, the percentage of changes to those files that came from Cascade rather than direct user edits.
According to their data, SWE-1 outperformed Claude models on both metrics, which Windsurf attributes to SWE-1 being specifically trained on the interaction patterns of its user base.
Strategic Implications for Windsurf and the Industry
The release of Wave 9 represents a significant strategic pivot for Windsurf. Rather than positioning itself solely as a tooling layer atop third-party LLMs, the company is now establishing itself as a model developer focused on specialized, domain-specific AI.
Industry observers note that this move comes amid rumors of acquisition talks with OpenAI valued at up to $3 billion, though no deal has been officially confirmed. The development of proprietary, high-performing models could significantly impact Windsurf's market position and valuation.
For the broader software development industry, SWE-1 suggests a trend toward more specialized AI systems that understand not just programming languages but entire workflows. This specialization could eventually redefine enterprise software development, with AI evolving from an autocomplete tool to a strategic partner across all phases of engineering.
The Road Ahead
Windsurf emphasizes that SWE-1 is just the beginning of its model development roadmap. "In the end, within the domain of software engineering, our goal is not to match frontier model performance of any research lab, but to exceed all of them," the company states.
Future updates will likely expand flow awareness to additional development surfaces such as CI/CD pipelines and code review platforms, while refining the models' ability to handle complex contexts and incomplete states.
For software developers and technical decision-makers, Wave 9's release suggests that AI-assisted software engineering is entering a new phase—one where purpose-built models, integrated deeply into development environments and trained on realistic engineering workflows, could dramatically accelerate productivity while handling increasingly complex tasks with less human intervention.
As Windsurf concludes in its announcement: "You will keep on hearing about improvements to the SWE family of models moving forwards. We will invest even more in this going forward to bring our users the best performance, at simultaneously the lowest cost, so that you can keep using Windsurf to build bigger and better things."
