The human drives the vision, AI structures it into a PRD and surfaces what you haven't thought of yet.
The New Agile
Delivery, rebuilt for the age of AI.
The Big Picture
One idea above all: specs & context are the product.
Make the spec and context the product, keep the pipeline portable across every tool, and shift the engineer's job from writing code to conducting it.
For two decades, Agile gave software teams a shared rhythm: sprints, standups, story points.
Then AI changed who writes the code, and how fast. The old ceremonies started measuring the wrong things. The real question isn't whether to use AI. It's how to run delivery when AI does the building, whether you're starting from zero or inheriting an existing codebase.
- 01
Own your inputs
Why specs and context, not code, are the source of truth.
- 02
Run a portable pipeline
Modes, the delivery flow, token economics, and the tool landscape.
- 03
Become the maestro
The new engineer's job, plus checklists you can run today.
Everything below is the how, across 11 sections.
Now, the tech details.
The 8 Principles
These replace the Agile Manifesto for AI-native delivery.
Not code. Not comments. Not Slack. The spec travels between humans, AI sessions, and tools. Code is a derivative of the spec.
The engineer explicitly owns and manages context. Every AI session starts with a deliberate context load. Context decay is a first-class risk.
The process lives in the spec, not in Lovable, v0, or any builder. Tools are interchangeable execution engines. The workflow is portable by design.
Build in modules with isolated specs. Each can be understood, rebuilt, or replaced independently. A spec pasted cold into any AI tool should produce a working first draft.
Token budgets, 5-hour resets, weekly quotas, are first-class constraints. Design your workflow around token economics, not a pretend infinite budget.
Before every context boundary, token limit, handoff, end of day, write a checkpoint. Checkpoints load future AI and onboard future humans.
The engineer doesn't write code, they conduct AI. They select the right model, sequence the pipeline, review outputs, and hold the context thread.
Docs split by concern: module spec, API contract, checkpoint log, decision log, onboarding brief. Each has one purpose and one consumer, human or AI.
Story points don't go away in AI-Native Agile. They get redefined. A point is no longer a measure of human effort hours, it's a measure of complexity, context weight, and token budget required. A 1-point task is one where a single coding agent can ship a complete module in a single session from a clean spec. A 5-point task requires multiple sessions, context resets, and cross-module integration. An 8-point task likely needs gear 2 or 3, a builder framework, multiple models, and an explicit handoff plan. Estimate by asking: how many context boundaries will this work cross?
Behind The Modes
Not every project needs the same setup. Match the approach to the complexity. The pipeline is the same, what changes is how many tools you bring and how much orchestration you need.
Well-scoped module, clear requirements, no heavy UI. You write the spec, a coding agent builds it. One tool, one context loop, fast feedback. Ship it.
Larger scope, multiple modules, team involved, or UI-heavy work. You need structured agent roles, a proper spec pipeline, and builder tooling to keep things coherent across sessions.
Complex, ambiguous, or high-stakes. No single tool wins. You move fluidly between models, builders, and frameworks, whatever gets the job done. The spec and checkpoints are your only constant. The Maestro holds the thread.
Every gear still runs the same 7 phases: Discovery → UX/UI → Spec → Builder → Build Loop → Review → Checkpoint. What changes is how many tools you're juggling and how formally you structure the orchestration. Start in first gear. Shift up only when you feel resistance.
The Delivery Pipeline
Two gears. Same discipline. Whether you're starting from zero or inheriting 50,000 lines of code, the spec always precedes the build.
PRD → design brief → HTML scaffold. Each step feeds the next. Skip entirely for non-UI modules.
The most important phase. Portable, tool-agnostic Module Specs that any AI builder can consume cold. For UI modules, the spec references the HTML scaffold from Phase 2, the visual layer is already solved.
Pick the delivery methodology that matches your gear. These are not execution tools, they are the orchestration layer that tells you how to build, not what to build with. Log the choice in the Decision Log.
AI generates, the human maestro steers. Reviews each loop, holds the context thread, calls when to reset. Never set-and-forget.
Human reviews AI output like a senior reviewing a junior's PR, with a fresh AI session as a second pair of eyes. Never merge without the human review step.
Before any handoff, to another session, engineer, or future-you, write the checkpoint. Non-negotiable.
Index the past. Automate the future.
An AI agent can't build on what it can't see. So before a single line is written, every existing artifact, including code, docs, decisions, and schemas, becomes part of a shared source of truth.
Every source feeds into a single graph-powered hub. The graph is the foundation everything else runs on.
A deterministic engine maps every function, call and dependency. Only then does AI step in, reading the graph and translating code logic into plain language. The result is documentation that is both exhaustive and human-readable.
- Parse every file : Tree-sitter builds a concrete syntax tree - every symbol, every token, nothing skipped.
- Query & map : Functions, imports, call sites and dependency edges extracted into a structured Code Graph.
- Trace call flows : Reads the graph, follows execution paths - what calls what, what breaks if this changes.
- Write in plain language : Code logic translated into business-readable docs. Written for people, not parsers.
Why sequence matters: AI discovering and analysing at the same time invents things - hallucinated APIs, phantom call paths. The deterministic engine eliminates that risk. By the time AI touches the codebase, there's nothing left to guess.
Why a graph? Flat documents answer "what does this file say?" A graph answers "what does this change break?" Relationships between modules, specs and decisions are queryable, not just searchable. Every agent in every phase queries the same graph.
With the Knowledge Hub established, the pipeline follows. Each phase draws from the same graph. No context lost, no handoff gaps.
- Phase 1 Knowledge Ingestion & Sync Build the living knowledge foundation
- Phase 2 Spec Generation Knowledge into actionable specifications
- Phase 3 Decomposition & Planning Research, analyze and plan
- Phase 4 Implementation Agent-driven code generation
- Phase 5 QC Testing & App Spec Context-aware test generation
Collect everything that defines the system: code, docs, endpoints, schemas, business context, and synchronise it continuously into the Knowledge Hub.
- Code, docs, specs & business MDs
- Endpoints, schema, architecture data
- Sync continuously with codebase
- Stored & organised in Knowledge Hub
The In-House Spec Agent analyses accumulated knowledge and business context to generate detailed, validated technical and business specifications.
- Analyse knowledge & business context from the Hub
- Generate detailed technical & business spec MDs
- Validate & enrich specs with PM & domain input
Uses the Knowledge Graph and Code Graph together to deeply understand the task before decomposing it into a dependency-aware execution plan.
- Understand task using Knowledge Graph & Code Graph
- Research, analyse and plan the approach for each task
Development Agents execute tasks following the plan. Automation Housekeeping keeps repos, docs and integrations continuously synchronised in the background.
- Execute tasks via agent workflows
- Code generation, validation & testing
- Deliver implemented code & artifacts
- Repository housekeeping
- Docs & Swagger sync
- Jira sync & Slack alignment
- Scan PRs & security alerts
- Bump libraries & dependencies
The In-House QC App Agent leverages the Knowledge Hub for full traceability, generating test cases, scenarios and QA reports directly traceable to specs, code and requirements.
- Generate test cases using Knowledge Hub context
- Leverage knowledge for context-aware test case generation
- Generate test scenarios, edge cases & data sets
- Deliver test suites & QA reports
- Codebase & architecture
- Business & technical docs
- Specs & schemas
- Decisions & context
- Endpoints, APIs, DB schema
- Always updated
- Traceable & versioned
- Accessible to all agents
- Drives all phases
- Graph-powered intelligence
Every change in code, docs, specs or processes is automatically captured, processed and reflected across the Knowledge Hub.
- Repository housekeeping
- Docs & Swagger sync
- Jira sync & Slack alignment
- Scan PRs & security alerts
- Bump libraries & dependencies
- And all the little things that keep everything in sync
Owning Context
Context is the most important engineering artifact you produce. Lose it and you lose velocity, quality, and continuity.
Context Load Template
Paste this at the start of every new AI session. Never assume the AI remembers anything.
# CONTEXT LOAD PROJECT: {{project_name}} MODULE: {{module_name}} SPEC VERSION: {{spec_version}} LAST CHECKPOINT: {{date + summary}} WHAT WAS BUILT: {{brief checkpoint summary}} CURRENT TASK: {{what you're building this session}} CONSTRAINTS: {{non-negotiables, tech stack, conventions}} DO NOT: {{explicitly excluded patterns or approaches}}
The Sharded Doc System
No single document owns all context. Each has one purpose and one consumer.
AI ignores constraints you defined earlier · Re-introduces patterns you rejected · Responses become generic · You're correcting more than reviewing · You're past 60% of your observed good-performance zone. Protocol: stop, write a checkpoint, start a fresh session.
Tool Strategy & Token Economics
Most AI tools meter you on a shared per-vendor pool, with rate limits, refresh windows, and weekly quotas. These are real constraints, engineer around them instead of pretending the budget is infinite.
Architecture, spec drafting, complex reasoning, reviewing output from other tools. High-value, deliberate use. Examples: Claude.ai, ChatGPT, Gemini, the chat surface of any frontier model.
Codebase-aware. Reads the whole repo. Best for logic, refactoring, backend, debugging with real context. Examples: Claude Code, Cursor, Codex, Aider.
Rapid UI generation from specs. Lives on its own token pool, so you can offload pixel-pushing without burning your reasoning budget. Examples: v0, Lovable, Bolt, Figma Make. Paste spec in, extract output, own the code.
Most critical work first while quota is fresh. Boilerplate and UI scaffolding go to dedicated builder tools on separate pools. Treat each refresh window as a sprint boundary. Never burn premium reasoning tokens on what a UI builder can do in 30 seconds.
The AI Builder Ecosystem
You don't need to pick one. The framework is tool-agnostic, these are the existing methodologies and builders worth knowing. Each solves a real problem. None of them should own your process.
These tools inspired parts of AI-Native Agile. BMAD and GSD influenced the spec-first approach. spec-kit and OpenSpec informed the module spec anatomy. Superpowers shaped the maestro's multi-model philosophy. Use whichever fits the job, just keep your process, specs, and checkpoints portable.
Breakthrough Method for Agile AI-Driven Development. A structured methodology for AI-assisted projects using specialized agent roles (Analyst, Architect, Developer). Strong on upfront planning and role-based prompting.
GitHub's toolkit for writing structured software specs. Provides templates and conventions for defining what to build before you build it. Directly influenced the Module Spec anatomy in this framework.
An open specification format for AI-driven development. Focuses on making specs machine-readable and AI-consumable, the same goal as portable Module Specs in this framework.
A framework for augmenting engineers with AI capabilities across the full dev lifecycle. Strong on the "engineer + AI" collaboration model, the Maestro concept draws directly from this philosophy.
A pragmatic, no-ceremony AI build methodology. Prioritizes speed and iteration over process overhead. Useful when you need to move fast on a well-understood module, pairs well with a pre-written spec.
Your spec travels. Your checkpoints travel. Your process travels. The tool you used to generate the output does not matter, own the output, not the tool.
The Module Spec
The Module Spec is the atomic unit of AI-Native Agile. Specs only, no code, no history. Copy-paste into any tool. Get consistent output.
Name, version, owner, last updated. One-line description. Example: auth-module v1.2 | User auth + session management
Everything the AI needs to know about the broader project to work on this module. Tech stack, conventions, architectural decisions. Reusable across sessions.
What data enters this module and what it produces. Explicit API contract boundary. Examples of valid and invalid inputs. Other modules depend on this, be precise.
Numbered list of what the module MUST do. Written as behaviors: MUST validate email before submit. Not implementation instructions, the AI chooses how.
Explicit list of what this module does NOT do. Prevents AI scope creep and over-engineering. Hard constraints: MUST NOT store passwords in plain text.
Specific, verifiable statements. If these pass, the module is done. No vague criteria, each must be independently verifiable by a human or AI.
What other modules this touches, reads from, writes to. Shared state or event contracts. How modules stay decoupled while cooperating.
Unresolved decisions needing human input. Signals to the AI where judgment calls are needed. Never let an AI silently resolve open questions.
The Checkpoint Protocol
Checkpoints are the connective tissue between sessions. They load future AI and onboard future humans with zero loss.
Before touching any code, write what you intend to do, which module, and what the success condition is. Primes your thinking and creates a reference if you drift.
Quick capture: what's done, what's left, decisions made. Your safety net if you hit quota unexpectedly. Moment to ask: is the current approach still the right one?
After each module or significant feature: what was built, spec version used, known issues, exact codebase state. Future sessions start here, not from memory.
State of all active modules, open decisions, blockers, and the exact next action. Must be complete enough that a stranger, human or AI, can pick up with zero clarification needed.
Full decision log summary, architectural decisions made, what was tried and rejected, and the onboarding brief link. This is what makes you a great collaborator.
The Maestro Engineer
The old engineer wrote code. The new engineer conducts AI. The maestro knows which model to pick, when to switch tools, and when to take the wheel themselves.
Knows which model to use for which task. Fast models for boilerplate. Frontier reasoning models for architecture. A competing model for second opinions. A long-context model when the surface is huge. Never defaults, always chooses.
Moves from coding agent to UI builder to design tool to conversational LLM without losing the thread. The spec is the invariant, not the tool. Every switch is deliberate and logged.
Actively manages context across sessions. Writes checkpoints before they're needed. Spots context decay early. The most expensive mistake is letting AI work in degraded context.
Spends 40% of time on specs before touching any builder. A great spec produces great output from any tool. A bad spec produces consistent garbage. The spec is the engineering work.
Reviews AI output like a senior reviews a junior's PR. Checks against the spec, not just "does it run?" Uses a separate session to review AI output; fresh context catches different issues.
Keeps modules decoupled. Rejects AI output that violates module boundaries even if it works. The AI won't remember why a boundary existed, the maestro does.
Run the Checklists
Use these before every project and every contribution.
Enablement Tools
The framework runs on discipline when it's just you. At team scale, you need infrastructure that keeps docs alive, context fresh, and everyone in sync, without burning human attention on maintenance.
A centralized markdown repository that serves as the project's single source of truth, shared across all roles. Engineers, PMs, and QC all read from and write into the same place. No more context locked in inboxes or heads.
The MDR hosts every artifact this framework produces: module specs, checkpoint logs, decision logs, onboarding briefs, API contracts, PRDs, and tech plans. It's what makes the sharded doc system operational at team scale.
Tesla is the agent that makes the MDR sustainable. Engineers shouldn't spend attention on maintenance, Tesla does it. It runs continuously in the background, keeping docs organized, Jira in sync, and the team unblocked.
The core philosophy: bureaucracy is necessary, spending human time on bureaucracy is not. Tesla absorbs the maintenance overhead so contributors can stay in the codebase.
Before building Tesla, validate the MDR manually. Get the team writing into a shared markdown repo for two weeks with zero automation. See what actually gets written, what gets skipped, and where the real friction lives. Build Tesla around the actual failure gears, not the imagined ones.
"The engineer who masters context masters AI.
The engineer who masters specs masters any tool.
The engineer who masters both is the maestro."
AI-Native Agile v2.0, A Living Framework