A living framework, version 2.0

The New Agile

A delivery framework for the age of AI, where context is the product, specs are portable, and the engineer becomes a maestro conducting multiple models.

Specs over scaffolding No tool lock-in Context ownership The Maestro Engineer Sharded docs Token economics
01 Foundation

The 8 Principles

These replace the Agile Manifesto for AI-native delivery.

P-01
Specs are the Source of Truth

Not code. Not comments. Not Slack. The spec travels between humans, AI sessions, and tools. Code is a derivative of the spec.

P-02
Context is Owned, Not Assumed

The engineer explicitly owns and manages context. Every AI session starts with a deliberate context load. Context decay is a first-class risk.

P-03
No Tool Lock-In

The process lives in the spec, not in Lovable, v0, or any builder. Tools are interchangeable execution engines. The workflow is portable by design.

P-04
Modules, Not Monoliths

Build in modules with isolated specs. Each can be understood, rebuilt, or replaced independently. A spec pasted cold into any AI tool should produce a working first draft.

P-05
Token Awareness is Engineering

Token budgets, 5-hour resets, weekly quotas, are first-class constraints. Design your workflow around token economics, not a pretend infinite budget.

P-06
Checkpoints are Sacred

Before every context boundary, token limit, handoff, end of day, write a checkpoint. Checkpoints load future AI and onboard future humans.

P-07
The Engineer is the Maestro

The engineer doesn't write code, they conduct AI. They select the right model, sequence the pipeline, review outputs, and hold the context thread.

P-08
Sharded Documentation

Docs split by concern: module spec, API contract, checkpoint log, decision log, onboarding brief. Each has one purpose and one consumer, human or AI.

Story Points still exist, they just measure different things now

Story points don't go away in AI-Native Agile. They get redefined. A point is no longer a measure of human effort hours, it's a measure of complexity, context weight, and token budget required. A 1-point task is one where a single coding agent can ship a complete module in a single session from a clean spec. A 5-point task requires multiple sessions, context resets, and cross-module integration. An 8-point task likely needs gear 2 or 3, a builder framework, multiple models, and an explicit handoff plan. Estimate by asking: how many context boundaries will this work cross?

02 Project Gear

Behind The Modes

Not every project needs the same setup. Match the approach to the complexity. The pipeline is the same, what changes is how many tools you bring and how much orchestration you need.

1
First gear
One agent is enough

Well-scoped module, clear requirements, no heavy UI. You write the spec, a coding agent builds it. One tool, one context loop, fast feedback. Ship it.

A coding agent, primary builder
A conversational LLM, spec drafting + PRD
Checkpoints in markdown files
Use when: greenfield features, scripts, APIs, small scoped modules
2
Second gear
Bring a builder framework

Larger scope, multiple modules, team involved, or UI-heavy work. You need structured agent roles, a proper spec pipeline, and builder tooling to keep things coherent across sessions.

A structured agent workflow (e.g. BMAD, GSD)
A UI scaffolding builder (e.g. v0, Lovable)
A coding agent for precision logic + review
A spec format your tools can consume (e.g. spec-kit, OpenSpec)
Use when: full features, multi-module projects, team handoffs
3
Maestro gear
Use whatever it takes to finish

Complex, ambiguous, or high-stakes. No single tool wins. You move fluidly between models, builders, and frameworks, whatever gets the job done. The spec and checkpoints are your only constant. The Maestro holds the thread.

Multi-model: rotate frontier LLMs per task
A structured framework when ceremony helps
UI builders + design tools for the visual layer
A coding agent for codebase-aware precision
A review framework (e.g. Superpowers) for augmented review
Use when: complex systems, unclear scope, greenfield products, high stakes
The gear doesn't change the pipeline, it changes the load

Every gear still runs the same 7 phases: Discovery → UX/UI → Spec → Builder → Build Loop → Review → Checkpoint. What changes is how many tools you're juggling and how formally you structure the orchestration. Start in first gear. Shift up only when you feel resistance.

03 Delivery Flow

The Delivery Pipeline

Two gears. Same discipline. Whether you're starting from zero or inheriting 50,000 lines of code, the spec always precedes the build.

Phase 1 Discovery & PRD Human + AI

The human drives the vision, AI structures it into a PRD and surfaces what you haven't thought of yet.

Describe the problem to a conversational LLM in plain language, let it draft a PRD, then interrogate it: what's missing, what's ambiguous, what's out of scope
Human validates the PRD against real user pain before it's locked, AI can't do this part
Decompose into named modules, each independently shippable
Define success criteria per module before any code is written
Phase 2 UX / UI Design Human + AI

PRD → design brief → HTML scaffold. Each step feeds the next. Skip entirely for non-UI modules.

01
UX Document User flows, IA, screen inventory, interaction patterns
UX Agent BMAD Claude.ai
Human approves
02
Wireframes Skeleton layouts for every screen & state. No colour, no polish.
Whimsical AI Pencil Excalidraw Balsamiq
Last cheap checkpoint
03
UI Design Polished layouts, component library, design tokens
Figma Make Galileo AI Uizard
Human reviews
04
HTML Generation Semantic HTML + CSS from designs, the scaffold artifact
Locofy v0 Lovable Claude Code
05
Scaffold Hand-off HTML committed to repo. Referenced in every UI module spec. Design frozen, specs describe behaviour only.
committed to repo
Phase 3 Spec Writing Human + AI

The most important phase. Portable, tool-agnostic Module Specs that any AI builder can consume cold. For UI modules, the spec references the HTML scaffold from Phase 2, the visual layer is already solved.

One spec per module: what it does, inputs/outputs, constraints, acceptance criteria, non-goals
Include a Context Load block, everything an AI needs about the broader project to work on this module
For UI modules: reference the Phase 2 HTML scaffold in the spec rather than describing the UI from scratch
Test it: paste cold into any builder, no working draft means the spec is incomplete
Phase 4 Choose the Framework Human Lead

Pick the delivery methodology that matches your gear. These are not execution tools, they are the orchestration layer that tells you how to build, not what to build with. Log the choice in the Decision Log.

Framework
What it gives you
Best gear match
Structured agent roles (Analyst, Architect, Dev). Enforces upfront planning and role-based prompting across the full delivery cycle
gear 2, gear 3
GitHub's spec templates and conventions. Provides the standard format for writing module specs that AI tools can consume reliably
gear 1 and up
Open specification format designed to be machine-readable. Makes your module specs consumable by any AI tool without reformatting
gear 1 and up
Engineer augmentation framework. Structures how the Maestro moves between models and tools without losing the context thread across the full lifecycle
gear 3 (Maestro)
No-ceremony, pragmatic build methodology. Minimum structure, maximum speed. Pairs best with a pre-written spec, the framework steps aside and lets you ship
gear 1
The Core Rule
Your spec travels. Your checkpoints travel. Your process travels. The framework you used to orchestrate the build does not matter, own the output, not the methodology.
Phase 5 Build Loop Human + AI

AI generates, the human maestro steers. Reviews each loop, holds the context thread, calls when to reset. Never set-and-forget.

Always start with a Context Load, paste the module spec + latest checkpoint
Work in short verifiable bursts, checkpoint every logical unit
When context degrades (AI ignores constraints, re-introduces rejected patterns), stop and reset
Builder tools: paste spec → max 3 iterations → export → own the code, don't stay in the builder
Phase 6 Review & Integration Human + AI

Human reviews AI output like a senior reviewing a junior's PR, with a fresh AI session as a second pair of eyes. Never merge without the human review step.

Review against the module spec acceptance criteria, not just "does it run?"
Use a fresh AI session, ideally a different model, to review the original output, fresh context catches different issues
Update the module spec if reality diverged, it must always reflect what was built
Log all decisions made during build in the Decision Log with reasoning
Phase 7 Checkpoint Human + AI

Before any handoff, to another session, engineer, or future-you, write the checkpoint. Non-negotiable.

What was built, decided, and deferred
State of each active module: spec version, completion %, known issues
Open questions and the exact next action
Token budget status, how much runway remains in the current cycle
04 Context Engine

Owning Context

Context is the most important engineering artifact you produce. Lose it and you lose velocity, quality, and continuity.

Context Load Template

Paste this at the start of every new AI session. Never assume the AI remembers anything.

context-load.md, paste at every session start
# CONTEXT LOAD

PROJECT:         {{project_name}}
MODULE:          {{module_name}}
SPEC VERSION:    {{spec_version}}
LAST CHECKPOINT: {{date + summary}}

WHAT WAS BUILT:
{{brief checkpoint summary}}

CURRENT TASK:
{{what you're building this session}}

CONSTRAINTS:
{{non-negotiables, tech stack, conventions}}

DO NOT:
{{explicitly excluded patterns or approaches}}

The Sharded Doc System

No single document owns all context. Each has one purpose and one consumer.

Module Spec
AI builders
Checkpoint Log
Next session
Decision Log
Future engineers
Onboarding Brief
New humans
API Contract
Module interfaces
Context Decay Signals, stop when you see these

AI ignores constraints you defined earlier · Re-introduces patterns you rejected · Responses become generic · You're correcting more than reviewing · You're past 60% of your observed good-performance zone. Protocol: stop, write a checkpoint, start a fresh session.

05 Token Economics

Tool Strategy & Token Economics

Most AI tools meter you on a shared per-vendor pool, with rate limits, refresh windows, and weekly quotas. These are real constraints, engineer around them instead of pretending the budget is infinite.

Tier 1, Think & Spec
Conversational LLM

Architecture, spec drafting, complex reasoning, reviewing output from other tools. High-value, deliberate use. Examples: Claude.ai, ChatGPT, Gemini, the chat surface of any frontier model.

Use when: designing, deciding, reviewing
Tier 2, Build & Precision
Coding Agent

Codebase-aware. Reads the whole repo. Best for logic, refactoring, backend, debugging with real context. Examples: Claude Code, Cursor, Codex, Aider.

Use when: implementing, debugging with codebase context
Tier 3, UI Scaffold
UI Builder

Rapid UI generation from specs. Lives on its own token pool, so you can offload pixel-pushing without burning your reasoning budget. Examples: v0, Lovable, Bolt, Figma Make. Paste spec in, extract output, own the code.

Use when: generating UI from spec
Token Conservation Protocol

Most critical work first while quota is fresh. Boilerplate and UI scaffolding go to dedicated builder tools on separate pools. Treat each refresh window as a sprint boundary. Never burn premium reasoning tokens on what a UI builder can do in 30 seconds.

06 The Builder Landscape

The AI Builder Ecosystem

You don't need to pick one. The framework is tool-agnostic, these are the existing methodologies and builders worth knowing. Each solves a real problem. None of them should own your process.

How to think about this

These tools inspired parts of AI-Native Agile. BMAD and GSD influenced the spec-first approach. spec-kit and OpenSpec informed the module spec anatomy. Superpowers shaped the maestro's multi-model philosophy. Use whichever fits the job, just keep your process, specs, and checkpoints portable.

BMAD Method
GitHub

Breakthrough Method for Agile AI-Driven Development. A structured methodology for AI-assisted projects using specialized agent roles (Analyst, Architect, Developer). Strong on upfront planning and role-based prompting.

Agent rolesStructured planningMulti-agent
spec-kit
GitHub

GitHub's toolkit for writing structured software specs. Provides templates and conventions for defining what to build before you build it. Directly influenced the Module Spec anatomy in this framework.

Spec templatesGitHub-nativePre-build planning
OpenSpec
GitHub

An open specification format for AI-driven development. Focuses on making specs machine-readable and AI-consumable, the same goal as portable Module Specs in this framework.

Machine-readable specsAI-consumableOpen standard
Superpowers
GitHub

A framework for augmenting engineers with AI capabilities across the full dev lifecycle. Strong on the "engineer + AI" collaboration model, the Maestro concept draws directly from this philosophy.

AI augmentationEngineer-ledFull lifecycle
Get Shit Done (GSD)
GitHub

A pragmatic, no-ceremony AI build methodology. Prioritizes speed and iteration over process overhead. Useful when you need to move fast on a well-understood module, pairs well with a pre-written spec.

PragmaticFast iterationLow ceremony
The Core Rule

Your spec travels. Your checkpoints travel. Your process travels. The tool you used to generate the output does not matter, own the output, not the tool.

07 The Atomic Unit

The Module Spec

The Module Spec is the atomic unit of AI-Native Agile. Specs only, no code, no history. Copy-paste into any tool. Get consistent output.

01 Module Identity

Name, version, owner, last updated. One-line description. Example: auth-module v1.2 | User auth + session management

02 Context Load Block

Everything the AI needs to know about the broader project to work on this module. Tech stack, conventions, architectural decisions. Reusable across sessions.

03 Inputs & Outputs

What data enters this module and what it produces. Explicit API contract boundary. Examples of valid and invalid inputs. Other modules depend on this, be precise.

04 Functional Requirements

Numbered list of what the module MUST do. Written as behaviors: MUST validate email before submit. Not implementation instructions, the AI chooses how.

05 Non-Goals & Constraints

Explicit list of what this module does NOT do. Prevents AI scope creep and over-engineering. Hard constraints: MUST NOT store passwords in plain text.

06 Acceptance Criteria

Specific, verifiable statements. If these pass, the module is done. No vague criteria, each must be independently verifiable by a human or AI.

07 Integration Points

What other modules this touches, reads from, writes to. Shared state or event contracts. How modules stay decoupled while cooperating.

08 Open Questions

Unresolved decisions needing human input. Signals to the AI where judgment calls are needed. Never let an AI silently resolve open questions.

08 Continuity System

The Checkpoint Protocol

Checkpoints are the connective tissue between sessions. They load future AI and onboard future humans with zero loss.

Write when: Session Start
Intention Checkpoint

Before touching any code, write what you intend to do, which module, and what the success condition is. Primes your thinking and creates a reference if you drift.

Write when: 50% Token Budget Used
Mid-Session Checkpoint

Quick capture: what's done, what's left, decisions made. Your safety net if you hit quota unexpectedly. Moment to ask: is the current approach still the right one?

Write when: Logical Unit Complete
Module Checkpoint

After each module or significant feature: what was built, spec version used, known issues, exact codebase state. Future sessions start here, not from memory.

Write when: End of Day / Context Boundary
Daily / Handoff Checkpoint

State of all active modules, open decisions, blockers, and the exact next action. Must be complete enough that a stranger, human or AI, can pick up with zero clarification needed.

Write when: Before Any Handoff
Transition Checkpoint

Full decision log summary, architectural decisions made, what was tried and rejected, and the onboarding brief link. This is what makes you a great collaborator.

09 The New Engineer

The Maestro Engineer

The old engineer wrote code. The new engineer conducts AI. The maestro knows which model to pick, when to switch tools, and when to take the wheel themselves.

Model Selection Judgment

Knows which model to use for which task. Fast models for boilerplate. Frontier reasoning models for architecture. A competing model for second opinions. A long-context model when the surface is huge. Never defaults, always chooses.

Tool Switching Fluency

Moves from coding agent to UI builder to design tool to conversational LLM without losing the thread. The spec is the invariant, not the tool. Every switch is deliberate and logged.

Context Thread Keeper

Actively manages context across sessions. Writes checkpoints before they're needed. Spots context decay early. The most expensive mistake is letting AI work in degraded context.

Spec Architect First

Spends 40% of time on specs before touching any builder. A great spec produces great output from any tool. A bad spec produces consistent garbage. The spec is the engineering work.

Critical Reviewer

Reviews AI output like a senior reviews a junior's PR. Checks against the spec, not just "does it run?" Uses a separate session to review AI output; fresh context catches different issues.

Module Boundary Guardian

Keeps modules decoupled. Rejects AI output that violates module boundaries even if it works. The AI won't remember why a boundary existed, the maestro does.

10 Quick Reference

Run the Checklists

Use these before every project and every contribution.

New Project
Problem Brief written (1 page, scope + out-of-scope)
Modules identified and named
Module Spec v1 written for first module
Tool selected (and logged why)
Doc structure created: spec/, checkpoints/, decisions/
First session starts with Context Load
First checkpoint written after first logical unit
Existing Project Contribution
Read Decision Log first
Read Module Spec for what you're touching
Read latest Checkpoint Log
Write your Contribution Spec (what you're adding / not touching)
Start session with Context Load
Review output against module spec acceptance criteria
Update module spec to reflect what was built
Write handoff checkpoint before merging
11 Team Infrastructure

Enablement Tools

The framework runs on discipline when it's just you. At team scale, you need infrastructure that keeps docs alive, context fresh, and everyone in sync, without burning human attention on maintenance.

Under Active Investigation, not yet adopted
MDR, Markdown Repo
The living knowledge base
Under Investigation

A centralized markdown repository that serves as the project's single source of truth, shared across all roles. Engineers, PMs, and QC all read from and write into the same place. No more context locked in inboxes or heads.

The MDR hosts every artifact this framework produces: module specs, checkpoint logs, decision logs, onboarding briefs, API contracts, PRDs, and tech plans. It's what makes the sharded doc system operational at team scale.

Anyone dumps raw notes, decisions, or updates, structure is handled automatically
Replaces Confluence as the living knowledge base (not Jira, that stays for task tracking)
AI sessions query the MDR directly for context loads instead of assembling them manually
Notion and ClickUp are AI-first alternatives worth evaluating, they support overnight housekeeping agents natively
Open Questions
File structure, access controls, and search strategy
How to keep MDR in sync with Jira without double-entry
Notion vs ClickUp vs raw Git repo, which gives best AI agent access?
Tesla
The always-on housekeeping agent
Under Investigation

Tesla is the agent that makes the MDR sustainable. Engineers shouldn't spend attention on maintenance, Tesla does it. It runs continuously in the background, keeping docs organized, Jira in sync, and the team unblocked.

The core philosophy: bureaucracy is necessary, spending human time on bureaucracy is not. Tesla absorbs the maintenance overhead so contributors can stay in the codebase.

MDR housekeeping, sorts, formats, and files raw dumps from any contributor
Jira/Notion sync, ticket hygiene and status updates run automatically overnight
Slack integration, anyone queries Tesla for project context, spec status, or progress without interrupting engineers
Context load generation, on demand, Tesla produces a ready-to-paste context load for any module
Open Questions
Coding-agent foundation vs lightweight orchestrators (e.g. OpenClaw / ZeroClaw), feasibility and token budget implications, Tesla must run on a separate API key, not the team's shared reasoning pool
Slack integration and acceptable response latency
Security boundaries, what can Tesla read, write, and never touch
Proof-of-concept scope before full adoption
Recommended first step

Before building Tesla, validate the MDR manually. Get the team writing into a shared markdown repo for two weeks with zero automation. See what actually gets written, what gets skipped, and where the real friction lives. Build Tesla around the actual failure gears, not the imagined ones.

"The engineer who masters context masters AI.
The engineer who masters specs masters any tool.
The engineer who masters both is the maestro."

AI-Native Agile v2.0, A Living Framework