Tutorial

Getting Started

Everything you need to go from zero to running autonomous coding agents.

Install claw-forge
Set up environment variables
Configure providers
Bootstrap your project
Write your project spec
Initialize with your spec
Run agents
Open the Kanban UI
Add more features
YOLO mode
Pause & resume
Workflows
Which command do I use?

Install claw-forge

We use uv for fast, isolated installation. It puts the claw-forge command on your PATH without polluting your system Python.

pip install uv
uv tool install claw-forge
claw-forge --version

pip install pipx
pipx install claw-forge
claw-forge --version

pip install claw-forge
claw-forge --version

claw-forge 0.2.0b1

💡 Node.js for the Kanban UI The CLI and agents need Python only. The optional Kanban UI (claw-forge ui) requires Node.js 18+. Install from nodejs.org if you want the visual board.

Set up environment variables

claw-forge reads credentials from environment variables. Copy the example file and fill in your keys:

cp .env.example .env
# Edit .env with your editor

The minimum you need to get started is one of the following:

# Run claude login once — claw-forge picks it up automatically.
# No env vars needed for OAuth. Token is read from:
#   ~/.claude/.credentials.json
claude login

# .env
ANTHROPIC_API_KEY_1=sk-ant-api03-...

# .env — Anthropic-compat proxy
PROXY_1_API_KEY=your-proxy-key
PROXY_1_BASE_URL=https://your-proxy.example.com/v1
PROXY_1_MODEL=claude-sonnet-4-6

claw-forge automatically loads .env from the same directory as claw-forge.yaml — no export or source needed.

✅ Full .env.example The repo ships a complete .env.example covering every provider: Anthropic, proxies (with base_url + model), AWS Bedrock, Azure, Vertex AI, Groq, Cerebras, and Ollama. Copy it and fill in only what you use — unset vars produce a warning but won't crash the run.

Configure providers

Create claw-forge.yaml in your project root. Every credential is read from env vars — never hardcode keys.

pool:
  strategy: priority
  max_retries: 3

providers:
  claude-oauth:
    type: anthropic_oauth
    priority: 1
    # Token auto-read from ~/.claude/.credentials.json

pool:
  strategy: priority
  max_retries: 3

providers:
  claude-oauth:
    type: anthropic_oauth
    priority: 1

  anthropic-primary:
    type: anthropic
    api_key: ${ANTHROPIC_API_KEY_1}
    priority: 2

  groq-backup:
    type: openai_compat
    api_key: ${GROQ_API_KEY}
    base_url: https://api.groq.com/openai/v1
    model: llama-3.3-70b-versatile
    priority: 3

pool:
  strategy: priority
  max_retries: 3

providers:
  # Anthropic-format proxy (x-api-key + /v1/messages)
  anthropic-proxy-1:
    type: anthropic_compat
    api_key: ${PROXY_1_API_KEY}
    base_url: ${PROXY_1_BASE_URL}
    model: ${PROXY_1_MODEL}
    priority: 1

  anthropic-proxy-2:
    type: anthropic_compat
    api_key: ${PROXY_2_API_KEY}
    base_url: ${PROXY_2_BASE_URL}
    model: ${PROXY_2_MODEL}
    priority: 2

pool:
  strategy: priority
  max_retries: 3

providers:
  # Ollama — local model, zero cost
  local-ollama:
    type: ollama
    base_url: ${OLLAMA_BASE_URL}
    model: ${OLLAMA_MODEL}
    priority: 1
    cost_per_mtok_input: 0.0
    cost_per_mtok_output: 0.0

  # .env:
  # OLLAMA_BASE_URL=http://localhost:11434
  # OLLAMA_MODEL=qwen2.5-coder

The pool manager routes each request through providers in priority order, skipping any that are rate-limited or have open circuit breakers.

Field	Required	Description
type	required	`anthropic` · `anthropic_compat` · `anthropic_oauth` · `openai_compat` · `bedrock` · `azure` · `vertex` · `ollama`
priority	required	Lower = tried first. Providers with the same priority compete via the routing strategy.
api_key	optional	Use `${ENV_VAR}` syntax. Omit for OAuth or no-auth proxies.
base_url	optional	Required for proxy and Ollama types. Use `${ENV_VAR}`.
model	optional	Default model for this provider. Use `${ENV_VAR}`. Falls back to the request model if unset.
model_map	optional	Map model names for proxies that use different identifiers.
cost_per_mtok_input	optional	USD per million input tokens. Used for cost tracking in the Kanban UI.

Bootstrap your project

Run claw-forge init first — before writing a spec. This scaffolds the .claude/commands/ folder (including the /create-spec slash command you'll need in the next step) and creates default config files if they don't exist yet.

cd my-project
claw-forge init

# Or bootstrap a specific directory without cd:
claw-forge init --project ~/projects/my-app

✓ Created claw-forge.yaml (edit providers as needed) ✓ Created .env.example (copy to .env and fill keys) ⚠ No .env found — copy .env.example → .env and add your API keys ✓ Stack detected: python / unknown ✓ Generated CLAUDE.md ✓ Created .claude/ with settings.json ✓ Scaffolded 8 slash commands → .claude/commands/

💡 Why init first? The /create-spec slash command lives in .claude/commands/ — it only exists after claw-forge init runs. Without bootstrapping first, you'd have no spec template to work from. Think of this step as installing the toolkit.

Write your project spec

Now that .claude/commands/ exists, open Claude Code in your project directory and use the /create-spec command — or write the spec manually. The spec describes what you want to build; the init agent reads it and breaks it into parallel tasks.

You have three ways to create a spec:

✍️

Write it yourself

Best control. Use the format below.

💬

/create-spec command

Interactive. Claude walks you through it conversationally.

📋

From existing docs

Paste a PRD, Notion doc, or README — Claude extracts the spec.

Option A — Write `app_spec.txt` yourself

The format is human-readable and intentionally flexible. The key thing is to be specific about acceptance criteria — vague features produce vague code.

# app_spec.txt — example: REST API with auth Project: task-manager-api Stack: Python 3.12, FastAPI, SQLAlchemy 2.0, PostgreSQL, pytest Description: A REST API for managing personal tasks with JWT authentication, tag-based filtering, and due-date reminders. ──────────────────────────────────────────────── Features — write one per task you want implemented ──────────────────────────────────────────────── 1. User authentication Description: JWT-based register/login/logout endpoints using bcrypt for password hashing. Access tokens expire in 1h, refresh tokens in 7d. Acceptance criteria: - POST /auth/register creates user, returns 201 with user_id - POST /auth/login returns {access_token, refresh_token, expires_in} - POST /auth/refresh exchanges refresh_token for new access_token - POST /auth/logout invalidates refresh token - Passwords hashed with bcrypt (cost factor 12) - 15 unit tests covering happy path and edge cases Tech notes: Use python-jose for JWT, passlib for bcrypt. 2. Task CRUD Description: Full create/read/update/delete for tasks. Tasks have: title, description, status (todo/in_progress/done), priority (1-5), due_date, tags (many-to-many). Acceptance criteria: - POST /tasks — create task, returns 201 - GET /tasks — list with pagination (?page=1&per_page=20) - GET /tasks/{id} — get one - PATCH /tasks/{id} — partial update - DELETE /tasks/{id} — soft delete (sets deleted_at) - All endpoints require valid JWT (401 if missing) - Users can only see their own tasks (403 if cross-user access) Depends on: 1 # depends on feature 1 (auth) 3. Tag filtering Description: Filter tasks by tags, status, priority, and due date range. Acceptance criteria: - GET /tasks?tags=work,urgent filters by ALL given tags - GET /tasks?status=todo&priority_gte=3 combines filters - GET /tasks?due_before=2026-04-01 filters by due date - Filters can be combined freely Depends on: 2 4. Integration test suite Description: Full end-to-end API tests using pytest + httpx AsyncClient. Tests run against an in-memory SQLite DB — no external deps required. Acceptance criteria: - Coverage ≥ 90% across all modules - Tests cover auth flow, CRUD, filtering, and error cases - All tests pass with: pytest tests/ -v Depends on: 1, 2, 3

💡 Tips for a good spec

One feature = one atomic unit of work. If it takes more than ~2 hours to implement, split it.
Acceptance criteria are tests. Write them as if they're a checklist for the agent to verify before marking the feature done.
Use Depends on: for hard dependencies. Features with no dependencies run in parallel in Wave 1.
Tech notes help. If you have a preferred library or pattern, mention it. Agents follow instructions well.
Start with 5–10 features. You can always add more with /expand-project once the first wave is running.

Option B — Interactive spec with `/create-spec`

Open Claude Code in your project directory and type /create-spec. Claude will walk you through the project conversationally — asking about your stack, features, providers, and concurrency — then write both app_spec.txt and claw-forge.yaml for you.

# Open your project in Claude Code, then type:
/create-spec

claw-forge project setup assistant What should we call this project? (used as directory name) > task-manager-api What's your tech stack? > Python, FastAPI, PostgreSQL, pytest List your top 5–7 features or user stories: > 1. JWT auth (register/login/logout/refresh) > 2. Task CRUD (title, status, priority, tags, due date) > 3. Tag and status filtering on GET /tasks > 4. Full integration test suite (≥90% coverage) > done Which AI providers do you have access to? (space-separated) > claude-oauth anthropic groq ... ✅ Written: app_spec.txt ✅ Written: claw-forge.yaml ✅ Written: .env.example Next: claw-forge plan app_spec.txt --project task-manager-api

Option C — Convert an existing PRD

If you already have a PRD, Notion export, or detailed README, claw-forge init copies app_spec.example.xml into your project so Claude knows the exact format. Paste your PRD into Claude and use this prompt:

Convert this PRD to claw-forge XML spec format.
Use app_spec.example.xml in this directory as the schema reference.
Write the result to app_spec.txt.
Break each requirement into a testable bullet: "User can...", "System returns...", "API validates..."
Aim for 100-300 bullets total.

Because app_spec.example.xml is already in the project directory, Claude Code reads it automatically and produces valid XML — no guessing the format.

Initialize with your spec

Now run claw-forge plan app_spec.txt. The initializer agent reads your spec, analyzes the project directory, and creates a dependency-ordered task graph in the state database.

claw-forge plan app_spec.txt --project task-manager-api

🔍 Running initializer agent… Reading spec: app_spec.txt Analyzing project: ./ ✅ Project initialized project_name: task-manager-api tech_stack: python/fastapi features_parsed: 4 tasks_created: 4 dependency_graph: Wave 1 (parallel): [1] User authentication Wave 2 (parallel): [2] Task CRUD Wave 3 (parallel): [3] Tag filtering Wave 4 (parallel): [4] Integration test suite session_id: abc-123-def manifest: .claw-forge/session_manifest.json

The initializer also writes .claw-forge/session_manifest.json — a pre-computed context blob that every subsequent agent session loads at start. This eliminates cold-start: agents don't re-analyse the project from scratch each time.

⚠️ Re-initializing Running init again on an existing project will add new tasks — it won't delete existing ones. Use /expand-project to add features to a running session instead.

What the manifest contains

You can inspect it any time:

cat .claw-forge/session_manifest.json

{ "project_name": "task-manager-api", "language": "python", "framework": "fastapi", "description": "REST API with JWT auth, task CRUD, tag filtering", "key_files": [ {"path": "src/auth.py", "role": "authentication module"}, {"path": "src/models.py", "role": "SQLAlchemy models"}, {"path": "tests/conftest.py", "role": "test fixtures"} ], "build_commands": ["uv sync", "alembic upgrade head"], "test_commands": ["pytest tests/ -v --cov"], "active_skills": ["pyright-lsp", "verification-gate"], "prior_decisions": [] }

Run agents

Start the harness. The dispatcher executes features in dependency-ordered waves — features with no unsatisfied dependencies run in parallel up to --concurrency.

claw-forge run --project task-manager-api --concurrency 3

🔥 claw-forge v0.2.0b1 Project: task-manager-api Session: abc-123-def Providers: claude-oauth 🟢 anthropic-primary 🟢 groq-backup 🟢 Strategy: priority Wave 1/4 — 1 task ⚙ [claude-oauth] coding: User authentication … ✅ [claude-oauth] User authentication — 47s $0.06 Wave 2/4 — 1 task ⚙ [claude-oauth] coding: Task CRUD … ✅ [claude-oauth] Task CRUD — 63s $0.09 Wave 3/4 — 1 task ⚙ [claude-oauth] coding: Tag filtering … ✅ [claude-oauth] Tag filtering — 31s $0.04 Wave 4/4 — 1 task ⚙ [claude-oauth] testing: Integration test suite … ✅ [claude-oauth] Integration test suite — 82s $0.11 ──────────────────────────────────────────────── ✅ All 4 features passing • total: $0.30 • 4m 23s ────────────────────────────────────────────────

The state service runs automatically on port 8420. Open http://localhost:8420/docs to explore the REST API, or use the Kanban UI (next step).

💡 Useful flags

--concurrency N		Max parallel agents (default: 5)
--model MODEL		Override the default model
--config FILE		Use a different claw-forge.yaml
--yolo		Max speed, no approval pauses (see YOLO mode)
--dry-run		Print tasks in dependency order without executing

Check provider health at any time:

claw-forge pool-status   # show all providers, rate limits, and health

Open the Kanban UI

claw-forge ships a React Kanban board that shows real-time agent progress, provider health, and cost. Launch it with one command:

claw-forge ui --session abc-123-def

🔥 Starting claw-forge Kanban UI UI: http://localhost:5173/?session=abc-123-def State API: http://localhost:8420 Press Ctrl+C to stop vite v5.4.0 dev server running at: ➜ Local: http://localhost:5173/

The board opens automatically in your browser:

What you'll see:

5 columns: Pending · In Progress · Passing · Failed · Blocked
Provider health dots in the header — click any dot for RPM, latency, circuit state, and cost
Progress bar — X/Y features passing, live
Cost tracker — total USD spent this session
Feature cards — show category badge, dep count, agent session ID when running, error message when failed

💡 UI options

claw-forge ui --port 3000        # custom port
claw-forge ui --no-open          # don't auto-open browser
claw-forge ui --state-port 9000  # different state service port

💡 All-in-one dev mode Use claw-forge dev to start the state service API and the Kanban UI in a single command with hot-reload:

claw-forge dev                          # state API + Vite UI, both with hot-reload
claw-forge dev --state-port 9000 --ui-port 3000   # custom ports
claw-forge dev --project /path/to/project --run   # also run agents

You can also open the board manually at any time at http://localhost:5173/?session=<uuid>.

Add more features

Once your first batch of features is running (or done), add more without restarting. There are two ways:

Option A — `/expand-project` slash command

Open Claude Code in your project directory and type /expand-project. Claude will list the current features, ask what you want to add, and POST them to the state service atomically.

/expand-project

Current features (4 passing): ✅ [1] User authentication ✅ [2] Task CRUD ✅ [3] Tag filtering ✅ [4] Integration test suite What new features would you like to add? > 5. Email reminders — send due-date reminders via SendGrid 24h before due date > 6. Rate limiting — 100 req/min per user via slowapi > done Depends on anything? > 5 depends on 2 (needs tasks with due dates) > 6 depends on 1 (needs auth middleware) ✅ Added 2 features [abc456] Email reminders (depends on: 2) [def789] Rate limiting (depends on: 1) Resuming dispatcher…

Option B — append to `app_spec.txt` and re-init

Add new entries to the bottom of your spec file and run init again. Only new features (those not already in the DB) will be created.

# Append to app_spec.txt, then:
claw-forge plan app_spec.txt --project task-manager-api

YOLO mode — maximum speed 🚀

YOLO mode enables three things at once:

Max concurrency — set to your CPU count automatically
Auto-approve human inputs — agents never pause waiting for you
Aggressive retry — 5 attempts per task instead of 3

claw-forge run --project task-manager-api --yolo

⚠️ YOLO MODE: Human approval skipped, max concurrency (12), aggressive retry 🔥 claw-forge v0.2.0b1 Wave 1/4 — all 4 tasks running in parallel…

⚠️ When to use YOLO mode

First-pass generation on a clean codebase
Rebuilding from a fresh spec after a big refactor
When you've already reviewed and trust the agent prompts
Not recommended for production systems or when the codebase has sensitive operations

Pause & resume

Pause a running session gracefully — in-flight agents finish their current task, then no new ones start. Resume picks up exactly where it stopped.

# Pause (drain mode — active agents complete, no new ones start)
claw-forge pause abc-123-def

# Resume
claw-forge resume abc-123-def

⏸ Project 'abc-123-def' paused. In-flight agents will complete. No new agents will start. Resume with: claw-forge resume abc-123-def

💡 Pause while you review Pause after Wave 1 completes, review the generated code, then resume. This is a great workflow when you want human checkpoints without using full YOLO mode.

Answer a stuck agent

If an agent has a question it can't answer on its own (a missing env var, an ambiguous requirement), it sets the task to needs_human and waits. Use claw-forge input to unblock it:

claw-forge input abc-123-def

🙋 1 pending question for 'abc-123-def': Task: Email reminders Q: What is the SendGrid API key env var name? Your answer: SENDGRID_API_KEY ✅ Answer submitted — task moved to pending

Workflows

Choose the workflow that matches your situation. Each one chains commands in the right order.

Greenfield — Build a new app from scratch

claw-forge init → /create-spec → claw-forge plan app_spec.txt → claw-forge run → /check-code → /checkpoint → /review-pr

Example: Building "TaskFlow API" (FastAPI + SQLite)

# 1. Scaffold project
mkdir taskflow-api && cd taskflow-api && git init
claw-forge init

# 2. Create spec interactively (in Claude Code)
#    Type: /create-spec
#    Claude asks about features, tech stack, DB schema
#    Writes: app_spec.txt + claw-forge.yaml

# 3. Initialize with spec
claw-forge plan app_spec.txt --concurrency 5

# 4. Run agents (opens 5 parallel coding agents)
claw-forge state &
claw-forge run --concurrency 5

# 5. Verify (in Claude Code)
#    /check-code    → ruff + mypy + pytest
#    /checkpoint    → git commit + state snapshot
#    /review-pr     → structured code review

Result: 59 features, ~30 min, ~$3.12, ~3,400 lines of tested code.

Plan reconciliation: Re-running claw-forge plan on an existing project preserves completed tasks and only adds new features. Use --fresh to start a clean session.

Brownfield — Add features to existing code

claw-forge analyze → /create-spec → claw-forge add → claw-forge run

Example: Adding Stripe payments to an existing FastAPI app

# 1. Analyze existing codebase (creates brownfield_manifest.json)
claw-forge analyze

# 2. Create brownfield spec (in Claude Code)
#    Type: /create-spec
#    Claude auto-detects brownfield mode from manifest
#    Asks: what to add, constraints, integration points
#    Writes: additions_spec.xml

# 3. Add features
claw-forge add --spec additions_spec.xml

# 4. Run agents
claw-forge run --concurrency 3

# 5. Verify
#    /check-code (71 tests: 59 original + 12 new, all passing)

Key: The manifest teaches agents your naming conventions, async patterns, and test style — so new code matches existing code.

Merging completed features

When using merge_strategy: manual, use claw-forge merge to squash-merge completed feature branches:

claw-forge merge                     # list ready feature branches
claw-forge merge feat/user-auth      # squash-merge a specific branch
claw-forge merge feat/user-auth --target develop  # merge into develop

Bug Fix — TDD regression fix

/create-bug-report → claw-forge fix → /check-code → /review-pr

Example: Fixing "password reset fails for uppercase emails"

# 1. Create bug report (in Claude Code)
#    Type: /create-bug-report
#    Claude guides you through 6 phases:
#      symptoms → reproduction → expected vs actual → scope → write report → fix
#    Writes: bug_report.md

# 2. Run fix (or let /create-bug-report trigger it)
# Fix from a bug description
claw-forge fix 'users get 500 on login with uppercase email'

# Fix from a structured bug report
claw-forge fix --report bug_report.md

# Fix without creating a branch
claw-forge fix --report bug_report.md --no-branch

# Agent does:
#   Phase 1 (RED):    Write test_password_reset_uppercase_email → FAILS ✓
#   Phase 2 (GREEN):  Fix auth/service.py → .lower() on email lookup → PASSES ✓
#   Phase 3 (REFACTOR): Run full suite → 72 passed, 0 failed ✓

# 3. Verify and push
#    /check-code → /review-pr → git push

Result: Mandatory regression test means the bug can never silently re-appear.

Parallel Sprint — Multi-agent feature development

claw-forge run --concurrency 5 → claw-forge status → /pool-status → /checkpoint

Example: Building 50 features with 5 concurrent agents

# Three terminals:
claw-forge state &              # Terminal 1: state service
claw-forge run --concurrency 5  # Terminal 2: agents
claw-forge ui                   # Terminal 3: Kanban board

# Monitor mid-sprint:
claw-forge status               # Phase progress, blocked features, cost
#    /pool-status               # Provider health, RPM, circuit breakers

# Handle blocked features:
claw-forge input saas-platform  # Answer agent questions interactively

# Save progress at milestones:
#    /checkpoint                # Git commit + state snapshot

Tip: Start with --concurrency 3 to verify your spec, then scale up.

Recovery — Resuming after interruption

claw-forge status → claw-forge run → /expand-project

Example: Laptop shut down mid-sprint (28/50 features done)

# 1. Check what happened
claw-forge status
# Shows: 28 passing, 5 interrupted, 17 pending

# 2. Resume — interrupted features reset to pending automatically
claw-forge state &
claw-forge run --concurrency 5
# "Resuming session: 28/50 passing, 22 remaining"

# 3. Optionally add more features mid-run (in Claude Code)
#    Type: /expand-project
#    Claude lists current features, asks what to add, POSTs atomically

Key: State persists to disk. Power loss, Ctrl+C, network drops — just claw-forge run again.

Regression safety: Before marking a session complete, the CLI waits up to 60s for in-flight regression tests to finish. Orphaned tasks from crashed sessions are automatically re-parented on restart.

Which command do I use?

Start here
  │
  ├── Building something new?
  │     └── claw-forge init → /create-spec → claw-forge plan → claw-forge run
  │
  ├── Adding features to existing code?
  │     └── claw-forge analyze → /create-spec → claw-forge add → claw-forge run
  │
  ├── Fixing a bug?
  │     └── /create-bug-report → claw-forge fix
  │
  ├── Checking project health?
  │     ├── Code quality     → /check-code
  │     ├── Feature progress → claw-forge status
  │     └── Provider health  → /pool-status
  │
  ├── Saving progress?
  │     └── /checkpoint → /review-pr → git push
  │
  ├── Resuming after a break?
  │     └── claw-forge status → claw-forge run
  │
  ├── Pausing / resuming agents?
  │     └── claw-forge pause <session> → claw-forge resume <session>
  │
  ├── Agent asking a question?
  │     └── claw-forge input <session>
  │
  ├── Merging feature branches?
  │     └── claw-forge merge feat/branch-name
  │
  ├── Starting state service standalone?
  │     └── claw-forge state (--port, --database-url for PostgreSQL)
  │
  └── Adding features mid-run?
        └── /expand-project

What's next?

📖 Reference

Full feature list
Commands reference — every flag and option
Workflow walkthroughs — 5 end-to-end guides
Claude Agent SDK API guide
Architecture deep-dive
Full config reference

⚡ Go deeper

Write a custom plugin
Add a skill for your tools
Use the file rewind API to undo bad refactors
Enable deep thinking for architecture tasks

Star on GitHub ⭐ See all features →

Tutorial

Contents

Install claw-forge

Set up environment variables

Configure providers

Bootstrap your project

Write your project spec

Option A — Write app_spec.txt yourself

Option B — Interactive spec with /create-spec

Option C — Convert an existing PRD

Initialize with your spec

What the manifest contains

Run agents

Open the Kanban UI

Add more features

Option A — /expand-project slash command

Option B — append to app_spec.txt and re-init

YOLO mode — maximum speed 🚀

Pause & resume

Answer a stuck agent

Workflows

Greenfield — Build a new app from scratch

Brownfield — Add features to existing code

Merging completed features

Bug Fix — TDD regression fix

Parallel Sprint — Multi-agent feature development

Recovery — Resuming after interruption

Which command do I use?

What's next?

Option A — Write `app_spec.txt` yourself

Option B — Interactive spec with `/create-spec`

Option A — `/expand-project` slash command

Option B — append to `app_spec.txt` and re-init