Tutorial

Everything you need to go from zero to running autonomous coding agents.

1

Install claw-forge

We use uv for fast, isolated installation. It puts the claw-forge command on your PATH without polluting your system Python.

pip install uv
uv tool install claw-forge
claw-forge --version
pip install pipx
pipx install claw-forge
claw-forge --version
pip install claw-forge
claw-forge --version
claw-forge 0.2.0b1
πŸ’‘ Node.js for the Kanban UI The CLI and agents need Python only. The optional Kanban UI (claw-forge ui) requires Node.js 18+. Install from nodejs.org if you want the visual board.
2

Set up environment variables

claw-forge reads credentials from environment variables. Copy the example file and fill in your keys:

cp .env.example .env
# Edit .env with your editor

The minimum you need to get started is one of the following:

# Run claude login once β€” claw-forge picks it up automatically.
# No env vars needed for OAuth. Token is read from:
#   ~/.claude/.credentials.json
claude login
# .env
ANTHROPIC_API_KEY_1=sk-ant-api03-...
# .env β€” Anthropic-compat proxy
PROXY_1_API_KEY=your-proxy-key
PROXY_1_BASE_URL=https://your-proxy.example.com/v1
PROXY_1_MODEL=claude-sonnet-4-6

claw-forge automatically loads .env from the same directory as claw-forge.yaml β€” no export or source needed.

βœ… Full .env.example The repo ships a complete .env.example covering every provider: Anthropic, proxies (with base_url + model), AWS Bedrock, Azure, Vertex AI, Groq, Cerebras, and Ollama. Copy it and fill in only what you use β€” unset vars produce a warning but won't crash the run.
3

Configure providers

Create claw-forge.yaml in your project root. Every credential is read from env vars β€” never hardcode keys.

pool:
  strategy: priority
  max_retries: 3

providers:
  claude-oauth:
    type: anthropic_oauth
    priority: 1
    # Token auto-read from ~/.claude/.credentials.json
pool:
  strategy: priority
  max_retries: 3

providers:
  claude-oauth:
    type: anthropic_oauth
    priority: 1

  anthropic-primary:
    type: anthropic
    api_key: ${ANTHROPIC_API_KEY_1}
    priority: 2

  groq-backup:
    type: openai_compat
    api_key: ${GROQ_API_KEY}
    base_url: https://api.groq.com/openai/v1
    model: llama-3.3-70b-versatile
    priority: 3
pool:
  strategy: priority
  max_retries: 3

providers:
  # Anthropic-format proxy (x-api-key + /v1/messages)
  anthropic-proxy-1:
    type: anthropic_compat
    api_key: ${PROXY_1_API_KEY}
    base_url: ${PROXY_1_BASE_URL}
    model: ${PROXY_1_MODEL}
    priority: 1

  anthropic-proxy-2:
    type: anthropic_compat
    api_key: ${PROXY_2_API_KEY}
    base_url: ${PROXY_2_BASE_URL}
    model: ${PROXY_2_MODEL}
    priority: 2
pool:
  strategy: priority
  max_retries: 3

providers:
  # Ollama β€” local model, zero cost
  local-ollama:
    type: ollama
    base_url: ${OLLAMA_BASE_URL}
    model: ${OLLAMA_MODEL}
    priority: 1
    cost_per_mtok_input: 0.0
    cost_per_mtok_output: 0.0

  # .env:
  # OLLAMA_BASE_URL=http://localhost:11434
  # OLLAMA_MODEL=qwen2.5-coder

The pool manager routes each request through providers in priority order, skipping any that are rate-limited or have open circuit breakers.

FieldRequiredDescription
typerequiredanthropic Β· anthropic_compat Β· anthropic_oauth Β· openai_compat Β· bedrock Β· azure Β· vertex Β· ollama
priorityrequiredLower = tried first. Providers with the same priority compete via the routing strategy.
api_keyoptionalUse ${ENV_VAR} syntax. Omit for OAuth or no-auth proxies.
base_urloptionalRequired for proxy and Ollama types. Use ${ENV_VAR}.
modeloptionalDefault model for this provider. Use ${ENV_VAR}. Falls back to the request model if unset.
model_mapoptionalMap model names for proxies that use different identifiers.
cost_per_mtok_inputoptionalUSD per million input tokens. Used for cost tracking in the Kanban UI.
4

Bootstrap your project

Run claw-forge init first β€” before writing a spec. This scaffolds the .claude/commands/ folder (including the /create-spec slash command you'll need in the next step) and creates default config files if they don't exist yet.

cd my-project
claw-forge init
βœ“ Created claw-forge.yaml (edit providers as needed) βœ“ Created .env.example (copy to .env and fill keys) ⚠ No .env found β€” copy .env.example β†’ .env and add your API keys βœ“ Stack detected: python / unknown βœ“ Generated CLAUDE.md βœ“ Created .claude/ with settings.json βœ“ Scaffolded 8 slash commands β†’ .claude/commands/
πŸ’‘ Why init first? The /create-spec slash command lives in .claude/commands/ β€” it only exists after claw-forge init runs. Without bootstrapping first, you'd have no spec template to work from. Think of this step as installing the toolkit.
5

Write your project spec

Now that .claude/commands/ exists, open Claude Code in your project directory and use the /create-spec command β€” or write the spec manually. The spec describes what you want to build; the init agent reads it and breaks it into parallel tasks.

You have three ways to create a spec:

✍️
Write it yourself

Best control. Use the format below.

πŸ’¬
/create-spec command

Interactive. Claude walks you through it conversationally.

πŸ“‹
From existing docs

Paste a PRD, Notion doc, or README β€” Claude extracts the spec.

Option A β€” Write app_spec.txt yourself

The format is human-readable and intentionally flexible. The key thing is to be specific about acceptance criteria β€” vague features produce vague code.

# app_spec.txt β€” example: REST API with auth Project: task-manager-api Stack: Python 3.12, FastAPI, SQLAlchemy 2.0, PostgreSQL, pytest Description: A REST API for managing personal tasks with JWT authentication, tag-based filtering, and due-date reminders. ──────────────────────────────────────────────── Features β€” write one per task you want implemented ──────────────────────────────────────────────── 1. User authentication Description: JWT-based register/login/logout endpoints using bcrypt for password hashing. Access tokens expire in 1h, refresh tokens in 7d. Acceptance criteria: - POST /auth/register creates user, returns 201 with user_id - POST /auth/login returns {access_token, refresh_token, expires_in} - POST /auth/refresh exchanges refresh_token for new access_token - POST /auth/logout invalidates refresh token - Passwords hashed with bcrypt (cost factor 12) - 15 unit tests covering happy path and edge cases Tech notes: Use python-jose for JWT, passlib for bcrypt. 2. Task CRUD Description: Full create/read/update/delete for tasks. Tasks have: title, description, status (todo/in_progress/done), priority (1-5), due_date, tags (many-to-many). Acceptance criteria: - POST /tasks β€” create task, returns 201 - GET /tasks β€” list with pagination (?page=1&per_page=20) - GET /tasks/{id} β€” get one - PATCH /tasks/{id} β€” partial update - DELETE /tasks/{id} β€” soft delete (sets deleted_at) - All endpoints require valid JWT (401 if missing) - Users can only see their own tasks (403 if cross-user access) Depends on: 1 # depends on feature 1 (auth) 3. Tag filtering Description: Filter tasks by tags, status, priority, and due date range. Acceptance criteria: - GET /tasks?tags=work,urgent filters by ALL given tags - GET /tasks?status=todo&priority_gte=3 combines filters - GET /tasks?due_before=2026-04-01 filters by due date - Filters can be combined freely Depends on: 2 4. Integration test suite Description: Full end-to-end API tests using pytest + httpx AsyncClient. Tests run against an in-memory SQLite DB β€” no external deps required. Acceptance criteria: - Coverage β‰₯ 90% across all modules - Tests cover auth flow, CRUD, filtering, and error cases - All tests pass with: pytest tests/ -v Depends on: 1, 2, 3
πŸ’‘ Tips for a good spec
  • One feature = one atomic unit of work. If it takes more than ~2 hours to implement, split it.
  • Acceptance criteria are tests. Write them as if they're a checklist for the agent to verify before marking the feature done.
  • Use Depends on: for hard dependencies. Features with no dependencies run in parallel in Wave 1.
  • Tech notes help. If you have a preferred library or pattern, mention it. Agents follow instructions well.
  • Start with 5–10 features. You can always add more with /expand-project once the first wave is running.

Option B β€” Interactive spec with /create-spec

Open Claude Code in your project directory and type /create-spec. Claude will walk you through the project conversationally β€” asking about your stack, features, providers, and concurrency β€” then write both app_spec.txt and claw-forge.yaml for you.

# Open your project in Claude Code, then type:
/create-spec
claw-forge project setup assistant What should we call this project? (used as directory name) > task-manager-api What's your tech stack? > Python, FastAPI, PostgreSQL, pytest List your top 5–7 features or user stories: > 1. JWT auth (register/login/logout/refresh) > 2. Task CRUD (title, status, priority, tags, due date) > 3. Tag and status filtering on GET /tasks > 4. Full integration test suite (β‰₯90% coverage) > done Which AI providers do you have access to? (space-separated) > claude-oauth anthropic groq ... βœ… Written: app_spec.txt βœ… Written: claw-forge.yaml βœ… Written: .env.example Next: claw-forge plan app_spec.txt --project task-manager-api

Option C β€” Convert an existing PRD

If you already have a PRD, Notion export, or detailed README, claw-forge init copies app_spec.example.xml into your project so Claude knows the exact format. Paste your PRD into Claude and use this prompt:

Convert this PRD to claw-forge XML spec format. Use app_spec.example.xml in this directory as the schema reference. Write the result to app_spec.txt. Break each requirement into a testable bullet: "User can...", "System returns...", "API validates..." Aim for 100-300 bullets total.

Because app_spec.example.xml is already in the project directory, Claude Code reads it automatically and produces valid XML β€” no guessing the format.

6

Initialize with your spec

Now run claw-forge plan app_spec.txt. The initializer agent reads your spec, analyzes the project directory, and creates a dependency-ordered task graph in the state database.

claw-forge plan app_spec.txt --project task-manager-api
πŸ” Running initializer agent… Reading spec: app_spec.txt Analyzing project: ./ βœ… Project initialized project_name: task-manager-api tech_stack: python/fastapi features_parsed: 4 tasks_created: 4 dependency_graph: Wave 1 (parallel): [1] User authentication Wave 2 (parallel): [2] Task CRUD Wave 3 (parallel): [3] Tag filtering Wave 4 (parallel): [4] Integration test suite session_id: abc-123-def manifest: .claw-forge/session_manifest.json

The initializer also writes .claw-forge/session_manifest.json β€” a pre-computed context blob that every subsequent agent session loads at start. This eliminates cold-start: agents don't re-analyse the project from scratch each time.

⚠️ Re-initializing Running init again on an existing project will add new tasks β€” it won't delete existing ones. Use /expand-project to add features to a running session instead.

What the manifest contains

You can inspect it any time:

cat .claw-forge/session_manifest.json
{ "project_name": "task-manager-api", "language": "python", "framework": "fastapi", "description": "REST API with JWT auth, task CRUD, tag filtering", "key_files": [ {"path": "src/auth.py", "role": "authentication module"}, {"path": "src/models.py", "role": "SQLAlchemy models"}, {"path": "tests/conftest.py", "role": "test fixtures"} ], "build_commands": ["uv sync", "alembic upgrade head"], "test_commands": ["pytest tests/ -v --cov"], "active_skills": ["pyright-lsp", "verification-gate"], "prior_decisions": [] }
7

Run agents

Start the harness. The dispatcher executes features in dependency-ordered waves β€” features with no unsatisfied dependencies run in parallel up to --concurrency.

claw-forge run task-manager-api --concurrency 3
πŸ”₯ claw-forge v0.2.0b1 Project: task-manager-api Session: abc-123-def Providers: claude-oauth 🟒 anthropic-primary 🟒 groq-backup 🟒 Strategy: priority Wave 1/4 β€” 1 task βš™ [claude-oauth] coding: User authentication … βœ… [claude-oauth] User authentication β€” 47s $0.06 Wave 2/4 β€” 1 task βš™ [claude-oauth] coding: Task CRUD … βœ… [claude-oauth] Task CRUD β€” 63s $0.09 Wave 3/4 β€” 1 task βš™ [claude-oauth] coding: Tag filtering … βœ… [claude-oauth] Tag filtering β€” 31s $0.04 Wave 4/4 β€” 1 task βš™ [claude-oauth] testing: Integration test suite … βœ… [claude-oauth] Integration test suite β€” 82s $0.11 ──────────────────────────────────────────────── βœ… All 4 features passing β€’ total: $0.30 β€’ 4m 23s ────────────────────────────────────────────────

The state service runs automatically on port 8888. Open http://localhost:8888/docs to explore the REST API, or use the Kanban UI (next step).

πŸ’‘ Useful flags
--concurrency NMax parallel agents (default: 3)
--model MODELOverride the default model
--config FILEUse a different claw-forge.yaml
--yoloMax speed, no approval pauses (see YOLO mode)
8

Open the Kanban UI

claw-forge ships a React Kanban board that shows real-time agent progress, provider health, and cost. Launch it with one command:

claw-forge ui --session abc-123-def
πŸ”₯ Starting claw-forge Kanban UI UI: http://localhost:5173/?session=abc-123-def State API: http://localhost:8888 Press Ctrl+C to stop vite v5.4.0 dev server running at: ➜ Local: http://localhost:5173/

The board opens automatically in your browser:

claw-forge Kanban board

What you'll see:

  • 5 columns: Pending Β· In Progress Β· Passing Β· Failed Β· Blocked
  • Provider health dots in the header β€” click any dot for RPM, latency, circuit state, and cost
  • Progress bar β€” X/Y features passing, live
  • Cost tracker β€” total USD spent this session
  • Feature cards β€” show category badge, dep count, agent session ID when running, error message when failed
πŸ’‘ UI options
claw-forge ui --port 3000        # custom port
claw-forge ui --no-open          # don't auto-open browser
claw-forge ui --state-port 9000  # different state service port

You can also open the board manually at any time at http://localhost:5173/?session=<uuid>.

9

Add more features

Once your first batch of features is running (or done), add more without restarting. There are two ways:

Option A β€” /expand-project slash command

Open Claude Code in your project directory and type /expand-project. Claude will list the current features, ask what you want to add, and POST them to the state service atomically.

/expand-project
Current features (4 passing): βœ… [1] User authentication βœ… [2] Task CRUD βœ… [3] Tag filtering βœ… [4] Integration test suite What new features would you like to add? > 5. Email reminders β€” send due-date reminders via SendGrid 24h before due date > 6. Rate limiting β€” 100 req/min per user via slowapi > done Depends on anything? > 5 depends on 2 (needs tasks with due dates) > 6 depends on 1 (needs auth middleware) βœ… Added 2 features [abc456] Email reminders (depends on: 2) [def789] Rate limiting (depends on: 1) Resuming dispatcher…

Option B β€” append to app_spec.txt and re-init

Add new entries to the bottom of your spec file and run init again. Only new features (those not already in the DB) will be created.

# Append to app_spec.txt, then:
claw-forge plan app_spec.txt --project task-manager-api
10

YOLO mode β€” maximum speed πŸš€

YOLO mode enables three things at once:

  • Max concurrency β€” set to your CPU count automatically
  • Auto-approve human inputs β€” agents never pause waiting for you
  • Aggressive retry β€” 5 attempts per task instead of 3
claw-forge run task-manager-api --yolo
⚠️ YOLO MODE: Human approval skipped, max concurrency (12), aggressive retry πŸ”₯ claw-forge v0.2.0b1 Wave 1/4 β€” all 4 tasks running in parallel…
⚠️ When to use YOLO mode
  • First-pass generation on a clean codebase
  • Rebuilding from a fresh spec after a big refactor
  • When you've already reviewed and trust the agent prompts
  • Not recommended for production systems or when the codebase has sensitive operations
11

Pause & resume

Pause a running session gracefully β€” in-flight agents finish their current task, then no new ones start. Resume picks up exactly where it stopped.

# Pause (drain mode β€” active agents complete, no new ones start)
claw-forge pause abc-123-def

# Resume
claw-forge resume abc-123-def
⏸ Project 'abc-123-def' paused. In-flight agents will complete. No new agents will start. Resume with: claw-forge resume abc-123-def
πŸ’‘ Pause while you review Pause after Wave 1 completes, review the generated code, then resume. This is a great workflow when you want human checkpoints without using full YOLO mode.

Answer a stuck agent

If an agent has a question it can't answer on its own (a missing env var, an ambiguous requirement), it sets the task to needs_human and waits. Use claw-forge input to unblock it:

claw-forge input abc-123-def
πŸ™‹ 1 pending question for 'abc-123-def': Task: Email reminders Q: What is the SendGrid API key env var name? Your answer: SENDGRID_API_KEY βœ… Answer submitted β€” task moved to pending

Workflows

Choose the workflow that matches your situation. Each one chains commands in the right order.

Greenfield β€” Build a new app from scratch

claw-forge init β†’ /create-spec β†’ claw-forge plan app_spec.txt β†’ claw-forge run β†’ /check-code β†’ /checkpoint β†’ /review-pr

Example: Building "TaskFlow API" (FastAPI + SQLite)

# 1. Scaffold project
mkdir taskflow-api && cd taskflow-api && git init
claw-forge init

# 2. Create spec interactively (in Claude Code)
#    Type: /create-spec
#    Claude asks about features, tech stack, DB schema
#    Writes: app_spec.txt + claw-forge.yaml

# 3. Initialize with spec
claw-forge plan app_spec.txt --concurrency 5

# 4. Run agents (opens 5 parallel coding agents)
claw-forge state &
claw-forge run --concurrency 5

# 5. Verify (in Claude Code)
#    /check-code    β†’ ruff + mypy + pytest
#    /checkpoint    β†’ git commit + state snapshot
#    /review-pr     β†’ structured code review
Result: 59 features, ~30 min, ~$3.12, ~3,400 lines of tested code.

Brownfield β€” Add features to existing code

claw-forge analyze β†’ /create-spec β†’ claw-forge add β†’ claw-forge run

Example: Adding Stripe payments to an existing FastAPI app

# 1. Analyze existing codebase (creates brownfield_manifest.json)
claw-forge analyze

# 2. Create brownfield spec (in Claude Code)
#    Type: /create-spec
#    Claude auto-detects brownfield mode from manifest
#    Asks: what to add, constraints, integration points
#    Writes: additions_spec.xml

# 3. Add features
claw-forge add --spec additions_spec.xml

# 4. Run agents
claw-forge run --concurrency 3

# 5. Verify
#    /check-code (71 tests: 59 original + 12 new, all passing)
Key: The manifest teaches agents your naming conventions, async patterns, and test style β€” so new code matches existing code.

Bug Fix β€” TDD regression fix

/create-bug-report β†’ claw-forge fix β†’ /check-code β†’ /review-pr

Example: Fixing "password reset fails for uppercase emails"

# 1. Create bug report (in Claude Code)
#    Type: /create-bug-report
#    Claude guides you through 6 phases:
#      symptoms β†’ reproduction β†’ expected vs actual β†’ scope β†’ write report β†’ fix
#    Writes: bug_report.md

# 2. Run fix (or let /create-bug-report trigger it)
claw-forge fix --report bug_report.md

# Agent does:
#   Phase 1 (RED):    Write test_password_reset_uppercase_email β†’ FAILS βœ“
#   Phase 2 (GREEN):  Fix auth/service.py β†’ .lower() on email lookup β†’ PASSES βœ“
#   Phase 3 (REFACTOR): Run full suite β†’ 72 passed, 0 failed βœ“

# 3. Verify and push
#    /check-code β†’ /review-pr β†’ git push
Result: Mandatory regression test means the bug can never silently re-appear.

Parallel Sprint β€” Multi-agent feature development

claw-forge run --concurrency 5 β†’ claw-forge status β†’ /pool-status β†’ /checkpoint

Example: Building 50 features with 5 concurrent agents

# Three terminals:
claw-forge state &              # Terminal 1: state service
claw-forge run --concurrency 5  # Terminal 2: agents
claw-forge ui                   # Terminal 3: Kanban board

# Monitor mid-sprint:
claw-forge status               # Phase progress, blocked features, cost
#    /pool-status               # Provider health, RPM, circuit breakers

# Handle blocked features:
claw-forge input saas-platform  # Answer agent questions interactively

# Save progress at milestones:
#    /checkpoint                # Git commit + state snapshot
Tip: Start with --concurrency 3 to verify your spec, then scale up.

Recovery β€” Resuming after interruption

claw-forge status β†’ claw-forge run β†’ /expand-project

Example: Laptop shut down mid-sprint (28/50 features done)

# 1. Check what happened
claw-forge status
# Shows: 28 passing, 5 interrupted, 17 pending

# 2. Resume β€” interrupted features reset to pending automatically
claw-forge state &
claw-forge run --concurrency 5
# "Resuming session: 28/50 passing, 22 remaining"

# 3. Optionally add more features mid-run (in Claude Code)
#    Type: /expand-project
#    Claude lists current features, asks what to add, POSTs atomically
Key: State persists to disk. Power loss, Ctrl+C, network drops β€” just claw-forge run again.

Which command do I use?

Start here
  β”‚
  β”œβ”€β”€ Building something new?
  β”‚     └── claw-forge init β†’ /create-spec β†’ claw-forge plan β†’ claw-forge run
  β”‚
  β”œβ”€β”€ Adding features to existing code?
  β”‚     └── claw-forge analyze β†’ /create-spec β†’ claw-forge add β†’ claw-forge run
  β”‚
  β”œβ”€β”€ Fixing a bug?
  β”‚     └── /create-bug-report β†’ claw-forge fix
  β”‚
  β”œβ”€β”€ Checking project health?
  β”‚     β”œβ”€β”€ Code quality     β†’ /check-code
  β”‚     β”œβ”€β”€ Feature progress β†’ claw-forge status
  β”‚     └── Provider health  β†’ /pool-status
  β”‚
  β”œβ”€β”€ Saving progress?
  β”‚     └── /checkpoint β†’ /review-pr β†’ git push
  β”‚
  β”œβ”€β”€ Resuming after a break?
  β”‚     └── claw-forge status β†’ claw-forge run
  β”‚
  └── Adding features mid-run?
        └── /expand-project
      

What's next?

πŸ“– Reference
⚑ Go deeper
Star on GitHub ⭐    See all features β†’