Level 7: Fully Autonomous Pipelines

The capstone level. Set it up, walk away, come back to finished work. This guide covers headless mode, batch processing, autonomous execution loops, CI/CD integration, safety sandboxing, and the Agent SDK for building production-grade pipelines with Claude Code.

Overview and Goal
Headless Mode -- Claude Without a Terminal
Batch Processing and Fan-Out
The RALF Loop (Autonomous Execution)
Stop Hooks for Verification Gates
CI/CD Integration
Log Analysis and Monitoring
Safety and Sandboxing
RALF vs GSD -- When to Use Which
Advanced Patterns
The Claude Agent SDK (Programmatic Access)
Exercises
Pro Tips from Boris Cherny
Anti-Patterns
Official Documentation Links

1. Overview and Goal

Everything up to this point has been interactive. You type, Claude responds. You approve, Claude acts. Level 7 removes you from the loop.

Autonomous pipelines are workflows where Claude Code executes without human intervention from start to finish. You define the task, the boundaries, the verification criteria, and the safety constraints. Then you walk away. When you come back, the work is done -- or it stopped safely at a verification gate that needs your attention.

What "Autonomous" Means in Practice

Level	Description	Human Involvement
Interactive	You type every prompt	Every turn
Semi-autonomous	You approve each tool use	Frequent
Permission-scoped	You pre-approve certain tools	Occasional
Fully autonomous	Claude runs start to finish	None until completion

The Building Blocks

Autonomous pipelines are built from these primitives:

Headless mode (-p flag) -- Run Claude without a terminal UI
Permission scoping (--allowedTools, --permission-mode) -- Define what Claude can do
Output formatting (--output-format json) -- Parse results programmatically
Iteration control (--max-turns) -- Prevent runaway execution
Verification hooks (Stop hooks) -- Gate completion on quality checks
Sandboxing (/sandbox, containers) -- Enforce filesystem and network boundaries

When to Go Autonomous

Batch processing hundreds of files with the same transformation
CI/CD pipelines that run on every PR
Overnight migrations or refactors
Scheduled code quality sweeps
Log monitoring and alerting
Multi-stage build-test-deploy pipelines

When NOT to Go Autonomous

Exploratory or ambiguous tasks
Tasks requiring creative judgment calls
Anything involving production databases without a rollback plan
First-time runs of a new pipeline (always test interactively first)

2. Headless Mode -- Claude Without a Terminal

Headless mode is the foundation of every autonomous pipeline. The -p (or --print) flag tells Claude Code to accept a prompt, execute it, print the result, and exit -- no interactive UI, no permission dialogs blocking execution.

The -p Flag

# Basic headless execution
claude -p "What files are in this project?"

# With tool permissions pre-approved
claude -p "Run the test suite and report failures" \
  --allowedTools "Bash(npm test *),Read"

# With structured output
claude -p "Summarize this project" --output-format json

The -p flag changes Claude Code from an interactive REPL to a one-shot command-line tool. It processes the prompt, uses whatever tools are needed (subject to permissions), and writes the result to stdout.

Input Methods

There are three ways to feed input to headless Claude:

Direct Prompt

claude -p "Explain the authentication flow in this codebase"

Piped stdin

# Pipe file contents
cat src/auth.py | claude -p "Review this code for security issues"

# Pipe command output
git diff HEAD~5 | claude -p "Summarize these changes"

# Pipe log output
tail -100 /var/log/app.log | claude -p "Identify any errors or anomalies"

File-Based Prompts

# Using system prompt from a file
claude -p "Review the code" --system-prompt-file ./prompts/security-review.txt

# Appending instructions from a file while keeping defaults
claude -p "Review this PR" --append-system-prompt-file ./prompts/style-rules.txt

Output Formats

Claude Code supports three output formats in headless mode. Each serves a different use case.

Text Output (Default)

Plain text, suitable for human reading or simple piping.

claude -p "What does the main function do?"
# Output: The main function initializes the application...

JSON Output

Structured JSON with metadata. The result text is in the result field.

claude -p "Summarize this project" --output-format json

Output structure:

{
  "type": "result",
  "subtype": "success",
  "cost_usd": 0.003,
  "is_error": false,
  "duration_ms": 4521,
  "duration_api_ms": 3200,
  "num_turns": 2,
  "result": "This project is a REST API built with Express.js...",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "total_cost_usd": 0.003
}

Extract the result with jq:

claude -p "Summarize this project" --output-format json | jq -r '.result'

JSON Schema Output

Get validated structured output conforming to a specific schema:

claude -p "Extract the main function names from auth.py" \
  --output-format json \
  --json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"string"}}},"required":["functions"]}'

The structured data appears in the structured_output field:

claude -p "Extract function names" \
  --output-format json \
  --json-schema '...' \
  | jq '.structured_output'

Stream JSON Output

Newline-delimited JSON for real-time streaming. Each line is a separate event.

claude -p "Explain recursion" \
  --output-format stream-json \
  --verbose \
  --include-partial-messages

Filter for just the streaming text:

claude -p "Write a poem" \
  --output-format stream-json \
  --verbose \
  --include-partial-messages | \
  jq -rj 'select(.type == "stream_event" and .event.delta.type? == "text_delta") | .event.delta.text'

Exit Codes

Claude Code returns meaningful exit codes in headless mode:

Exit Code	Meaning
0	Success
1	Error (general failure)
2	Max turns reached (when using `--max-turns`)

Use exit codes in scripts for conditional logic:

claude -p "Run tests and fix failures" \
  --allowedTools "Bash,Read,Edit" \
  --max-turns 10

EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
  echo "All tasks completed successfully"
elif [ $EXIT_CODE -eq 2 ]; then
  echo "Hit max turns limit -- may need more iterations"
else
  echo "Error occurred"
fi

Environment Variables for Headless Operation

Variable	Purpose
`ANTHROPIC_API_KEY`	API key for authentication
`CLAUDE_CODE_USE_BEDROCK`	Set to `1` to use AWS Bedrock
`CLAUDE_CODE_USE_VERTEX`	Set to `1` to use Google Vertex AI

Complete CLI Reference for Headless Flags

These flags are the most relevant for headless/autonomous operation:

Flag	Description
`-p`, `--print`	Run in headless mode (required)
`--output-format`	`text`, `json`, or `stream-json`
`--json-schema`	JSON Schema for structured output
`--input-format`	Input format: `text` or `stream-json`
`--include-partial-messages`	Include streaming events (requires `stream-json`)
`--allowedTools`	Tools that execute without permission prompts
`--disallowedTools`	Tools removed from the model entirely
`--tools`	Restrict which tools are available (`"Bash,Edit,Read"`)
`--permission-mode`	`default`, `acceptEdits`, `plan`, `dontAsk`, `bypassPermissions`
`--dangerously-skip-permissions`	Skip ALL permission prompts
`--max-turns`	Limit agentic turns (exits with code 2 when reached)
`--max-budget-usd`	Maximum dollar spend before stopping
`--model`	Model selection: `sonnet`, `opus`, or full model ID
`--fallback-model`	Fallback when primary model is overloaded
`--system-prompt`	Replace entire system prompt
`--system-prompt-file`	Replace system prompt from file
`--append-system-prompt`	Append to default system prompt
`--append-system-prompt-file`	Append from file to system prompt
`--continue`, `-c`	Continue most recent conversation
`--resume`, `-r`	Resume specific session by ID
`--no-session-persistence`	Do not save session to disk
`--verbose`	Show full turn-by-turn output
`--debug`	Enable debug logging
`--mcp-config`	Load MCP servers from JSON config

Continuing Conversations Programmatically

# First run
claude -p "Review this codebase for performance issues" --output-format json > first_pass.json

# Extract session ID
SESSION_ID=$(jq -r '.session_id' first_pass.json)

# Continue the same conversation
claude -p "Now focus on the database queries" --resume "$SESSION_ID"

# Or just continue the most recent conversation
claude -p "Generate a summary of all issues found" --continue

3. Batch Processing and Fan-Out

Batch processing is where headless mode pays off. Instead of running Claude once, you run it across hundreds of files, each invocation independent and parallelizable.

The Basic Pattern

# Process each file in a loop
for file in src/**/*.ts; do
  claude -p "Add JSDoc comments to all exported functions in this file" \
    --allowedTools "Read,Edit" \
    < "$file"
done

Parallel Execution with xargs

# Process 4 files at a time in parallel
find src -name "*.ts" | xargs -P 4 -I {} \
  claude -p "Add JSDoc comments to all exported functions in {}" \
    --allowedTools "Read,Edit"

Parallel Execution with GNU parallel

# Process with GNU parallel (better job control)
find src -name "*.ts" | parallel -j 4 \
  claude -p "Add JSDoc comments to all exported functions in {}" \
    --allowedTools "Read,Edit"

Scoped Permissions with --allowedTools

The --allowedTools flag is critical for batch scripts. It defines exactly which tools Claude can use without prompting. This follows the permission rule syntax:

# Read-only analysis (safest)
claude -p "Review this code" --allowedTools "Read,Grep,Glob"

# Read and edit (for transformations)
claude -p "Add types" --allowedTools "Read,Edit"

# With specific bash commands
claude -p "Run tests" --allowedTools "Bash(npm test *),Read"

# Wildcard bash with prefix matching
claude -p "Git operations" --allowedTools "Bash(git diff *),Bash(git log *),Bash(git status *)"

The space before * matters. Bash(git diff *) matches git diff HEAD but not git diff-index. Without the space, Bash(git diff*) would match both.

Error Handling in Batch Scripts

#!/bin/bash
set -euo pipefail

RESULTS_DIR="./batch-results"
mkdir -p "$RESULTS_DIR"

FAILED=0
SUCCEEDED=0
TOTAL=0

for file in src/components/*.tsx; do
  TOTAL=$((TOTAL + 1))
  BASENAME=$(basename "$file" .tsx)

  echo "Processing: $file"

  if claude -p "Add comprehensive prop type definitions to this React component. \
    Read the file at $file, add TypeScript interfaces for all props, \
    and ensure all props are properly typed." \
    --allowedTools "Read,Edit" \
    --max-turns 5 \
    --output-format json \
    > "$RESULTS_DIR/${BASENAME}.json" 2>&1; then
    SUCCEEDED=$((SUCCEEDED + 1))
    echo "  OK: $file"
  else
    FAILED=$((FAILED + 1))
    echo "  FAIL: $file (exit code: $?)"
  fi
done

echo ""
echo "=== Batch Results ==="
echo "Total:     $TOTAL"
echo "Succeeded: $SUCCEEDED"
echo "Failed:    $FAILED"

Collecting and Aggregating Results

#!/bin/bash
# Collect results from multiple Claude runs into a single report

REPORT_FILE="./batch-report.md"
echo "# Batch Processing Report" > "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "Generated: $(date)" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"

for result_file in ./batch-results/*.json; do
  FILENAME=$(basename "$result_file" .json)
  RESULT=$(jq -r '.result // "No result"' "$result_file")
  COST=$(jq -r '.cost_usd // "unknown"' "$result_file")
  IS_ERROR=$(jq -r '.is_error // false' "$result_file")

  echo "## $FILENAME" >> "$REPORT_FILE"
  echo "- Cost: \$${COST}" >> "$REPORT_FILE"
  echo "- Error: ${IS_ERROR}" >> "$REPORT_FILE"
  echo "" >> "$REPORT_FILE"
  echo "$RESULT" >> "$REPORT_FILE"
  echo "" >> "$REPORT_FILE"
  echo "---" >> "$REPORT_FILE"
  echo "" >> "$REPORT_FILE"
done

echo "Report written to $REPORT_FILE"

Example 1: File Migration (React Class Components to Hooks)

#!/bin/bash
# migrate-to-hooks.sh
# Migrate React class components to functional components with hooks
set -euo pipefail

MIGRATION_PROMPT="Read this file. If it contains a React class component, \
convert it to a functional component using hooks. Preserve all functionality: \
- Convert state to useState \
- Convert lifecycle methods to useEffect \
- Convert class methods to regular functions or useCallback \
- Preserve all props and their types \
- Keep all existing tests passing \
If the file is already a functional component, make no changes."

LOG_FILE="./migration-log.txt"
echo "Migration started: $(date)" > "$LOG_FILE"

find src/components -name "*.tsx" -o -name "*.jsx" | while read -r file; do
  echo "Migrating: $file"

  claude -p "$MIGRATION_PROMPT" \
    --allowedTools "Read,Edit" \
    --append-system-prompt "You are migrating the file at: $file" \
    --max-turns 8 \
    --output-format json \
    2>&1 | tee -a "$LOG_FILE" | jq -r '.result // "ERROR"' || true

  echo "---" >> "$LOG_FILE"
done

echo "Migration completed: $(date)" >> "$LOG_FILE"

Example 2: Adding License Headers to Files

#!/bin/bash
# add-license-headers.sh
# Add license headers to all source files missing them
set -euo pipefail

LICENSE_HEADER="Copyright (c) 2026 Acme Corp. All rights reserved.
Licensed under the MIT License."

find src -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" | while read -r file; do
  # Check if file already has a license header
  if head -5 "$file" | grep -q "Copyright"; then
    echo "SKIP (already has header): $file"
    continue
  fi

  echo "Adding header to: $file"

  claude -p "Add this license header as a comment block at the very top of \
    the file at $file (before any imports). Use the appropriate comment \
    syntax for the file type. The license text is: ${LICENSE_HEADER}" \
    --allowedTools "Read,Edit" \
    --max-turns 3 \
    --output-format json | jq -r '.is_error' || true
done

Example 3: Generating Tests for Each Module

#!/bin/bash
# generate-tests.sh
# Generate unit tests for every module that lacks them
set -euo pipefail

RESULTS_DIR="./test-generation-results"
mkdir -p "$RESULTS_DIR"

TEST_PROMPT="Read the source file and generate comprehensive unit tests for it. \
Requirements: \
- Use the existing test framework (Jest/Vitest) found in the project \
- Cover all exported functions and classes \
- Include edge cases and error conditions \
- Follow existing test patterns in the project \
- Place the test file next to the source file as __tests__/<filename>.test.ts \
- Run the tests to verify they pass"

find src -name "*.ts" ! -name "*.test.ts" ! -name "*.spec.ts" ! -path "*__tests__*" | while read -r file; do
  BASENAME=$(basename "$file" .ts)
  TEST_FILE="$(dirname "$file")/__tests__/${BASENAME}.test.ts"

  # Skip if test already exists
  if [ -f "$TEST_FILE" ]; then
    echo "SKIP (test exists): $file"
    continue
  fi

  echo "Generating tests for: $file"

  claude -p "$TEST_PROMPT for the file at $file" \
    --allowedTools "Read,Write,Edit,Bash(npx jest *),Bash(npm test *),Glob,Grep" \
    --max-turns 15 \
    --output-format json \
    > "$RESULTS_DIR/${BASENAME}.json" 2>&1

  IS_ERROR=$(jq -r '.is_error' "$RESULTS_DIR/${BASENAME}.json")
  if [ "$IS_ERROR" = "true" ]; then
    echo "  FAIL: $file"
  else
    echo "  OK: $file"
  fi
done

Example 4: Code Review of Multiple PRs

#!/bin/bash
# review-prs.sh
# Review all open PRs in a repository
set -euo pipefail

REVIEW_PROMPT="You are a senior code reviewer. Review the PR diff provided via stdin. \
Focus on: \
1. Security vulnerabilities (SQL injection, XSS, auth bypass) \
2. Performance issues (N+1 queries, memory leaks, unnecessary re-renders) \
3. Code quality (naming, complexity, duplication) \
4. Test coverage (are new features tested?) \
5. Breaking changes (API contract, database schema) \

Output a structured review with severity levels: CRITICAL, WARNING, INFO."

# Get list of open PRs
PR_NUMBERS=$(gh pr list --state open --json number --jq '.[].number')

for pr in $PR_NUMBERS; do
  echo "Reviewing PR #${pr}..."

  gh pr diff "$pr" | claude -p "$REVIEW_PROMPT" \
    --append-system-prompt "You are reviewing PR #${pr}." \
    --output-format json \
    --max-turns 3 \
    > "./reviews/pr-${pr}-review.json" 2>&1

  RESULT=$(jq -r '.result' "./reviews/pr-${pr}-review.json")
  echo "PR #${pr} Review:"
  echo "$RESULT"
  echo "---"
done

4. The RALF Loop (Autonomous Execution)

RALF stands for Read-Act-Loop-Finish. It is a pattern for autonomous, multi-iteration execution where Claude reads a specification, works through it story by story, verifies after each step, and loops until all work is done or a safety limit is hit.

What RALF Is

RALF is not a built-in Claude Code feature. It is a scripting pattern that wraps Claude Code's headless mode in a loop with:

Read: Claude reads a structured specification (the prd.json)
Act: Claude implements one user story or task
Loop: The script checks progress and sends Claude back for the next task
Finish: All acceptance criteria are met, or max_iterations is reached

The key insight is that each iteration starts a fresh Claude context (or continues a session), preventing the context window from degrading over long-running tasks.

The prd.json Format

The PRD (Product Requirements Document) file defines what Claude should build. It uses a structured format with user stories and acceptance criteria so progress can be verified programmatically.

{
  "project": "User Authentication System",
  "description": "Implement a complete authentication system with login, signup, and session management",
  "tech_stack": {
    "language": "TypeScript",
    "framework": "Express.js",
    "database": "PostgreSQL with Prisma ORM",
    "testing": "Jest"
  },
  "stories": [
    {
      "id": "AUTH-001",
      "title": "User Registration",
      "description": "As a new user, I want to create an account so I can access the application",
      "acceptance_criteria": [
        "POST /api/auth/register endpoint accepts email and password",
        "Password is hashed with bcrypt before storage",
        "Email uniqueness is enforced at the database level",
        "Returns 201 with user object (without password) on success",
        "Returns 409 if email already exists",
        "Input validation rejects invalid email formats",
        "Unit tests pass for all success and error cases"
      ],
      "files": ["src/routes/auth.ts", "src/models/user.ts", "src/middleware/validation.ts"],
      "status": "pending"
    },
    {
      "id": "AUTH-002",
      "title": "User Login",
      "description": "As a registered user, I want to log in to receive a session token",
      "acceptance_criteria": [
        "POST /api/auth/login endpoint accepts email and password",
        "Returns JWT token on successful authentication",
        "Returns 401 on invalid credentials",
        "Token contains user ID and expiration time",
        "Unit tests pass for all cases"
      ],
      "files": ["src/routes/auth.ts", "src/utils/jwt.ts"],
      "status": "pending",
      "depends_on": ["AUTH-001"]
    },
    {
      "id": "AUTH-003",
      "title": "Protected Routes Middleware",
      "description": "As a developer, I want middleware to protect routes that require authentication",
      "acceptance_criteria": [
        "Middleware extracts JWT from Authorization header",
        "Middleware verifies token validity and expiration",
        "Middleware attaches user object to request",
        "Returns 401 for missing or invalid tokens",
        "Returns 403 for expired tokens",
        "Unit tests pass for all cases"
      ],
      "files": ["src/middleware/auth.ts"],
      "status": "pending",
      "depends_on": ["AUTH-002"]
    },
    {
      "id": "AUTH-004",
      "title": "Integration Tests",
      "description": "Complete integration test suite for the auth system",
      "acceptance_criteria": [
        "Full registration-login-access flow works end-to-end",
        "All edge cases are covered",
        "Tests use a test database, not production",
        "All tests pass"
      ],
      "files": ["tests/integration/auth.test.ts"],
      "status": "pending",
      "depends_on": ["AUTH-003"]
    }
  ],
  "constraints": [
    "Do not modify files outside of src/ and tests/",
    "Follow existing code style and patterns",
    "All new code must have TypeScript strict mode enabled",
    "No console.log statements in production code"
  ],
  "verification_command": "npm test"
}

The Execution Loop Script

This is the core RALF implementation. It iterates through the PRD stories, sends each to Claude, verifies the result, and updates the status.

#!/bin/bash
# ralf.sh -- Read-Act-Loop-Finish autonomous execution
set -euo pipefail

PRD_FILE="${1:-prd.json}"
MAX_ITERATIONS="${2:-20}"
LOG_DIR="./ralf-logs"
mkdir -p "$LOG_DIR"

ITERATION=0

echo "=== RALF Loop Starting ==="
echo "PRD: $PRD_FILE"
echo "Max iterations: $MAX_ITERATIONS"
echo ""

while [ $ITERATION -lt $MAX_ITERATIONS ]; do
  ITERATION=$((ITERATION + 1))
  echo "--- Iteration $ITERATION / $MAX_ITERATIONS ---"

  # Find the next pending story
  NEXT_STORY=$(jq -r '
    .stories[]
    | select(.status == "pending")
    | select(
        (.depends_on // [])
        | all(. as $dep | $dep |
          IN(input.stories[] | select(.status == "completed") | .id)
        ) // true
      )
    | .id
  ' "$PRD_FILE" 2>/dev/null | head -1)

  # Simpler fallback: just get the first pending story
  if [ -z "$NEXT_STORY" ] || [ "$NEXT_STORY" = "null" ]; then
    NEXT_STORY=$(jq -r '.stories[] | select(.status == "pending") | .id' "$PRD_FILE" | head -1)
  fi

  # Check if all stories are done
  if [ -z "$NEXT_STORY" ] || [ "$NEXT_STORY" = "null" ]; then
    echo ""
    echo "=== All stories completed! ==="
    break
  fi

  # Extract story details
  STORY_TITLE=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .title" "$PRD_FILE")
  STORY_DESC=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .description" "$PRD_FILE")
  ACCEPTANCE=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .acceptance_criteria | join(\"\n- \")" "$PRD_FILE")
  CONSTRAINTS=$(jq -r '.constraints | join("\n- ")' "$PRD_FILE")
  VERIFY_CMD=$(jq -r '.verification_command // "echo No verification command"' "$PRD_FILE")

  echo "Working on: [$NEXT_STORY] $STORY_TITLE"

  # Build the prompt for this iteration
  PROMPT="You are implementing a software project defined in $PRD_FILE.

Current task: [$NEXT_STORY] $STORY_TITLE
Description: $STORY_DESC

Acceptance Criteria:
- $ACCEPTANCE

Project Constraints:
- $CONSTRAINTS

Instructions:
1. Read the PRD file and any existing code to understand the full context
2. Implement this specific story ($NEXT_STORY)
3. Write the code that satisfies ALL acceptance criteria
4. Run the verification command: $VERIFY_CMD
5. Fix any test failures
6. When all acceptance criteria are met, report SUCCESS

Do NOT implement other stories. Focus only on $NEXT_STORY."

  # Execute Claude
  claude -p "$PROMPT" \
    --allowedTools "Read,Write,Edit,Bash(npm *),Bash(npx *),Bash(git diff *),Bash(git status *),Glob,Grep" \
    --max-turns 25 \
    --output-format json \
    > "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json" 2>&1

  EXIT_CODE=$?

  # Check result
  IS_ERROR=$(jq -r '.is_error // false' "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json")
  RESULT=$(jq -r '.result // "No result"' "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json")

  if [ "$EXIT_CODE" -eq 0 ] && [ "$IS_ERROR" = "false" ]; then
    # Run verification
    echo "  Running verification: $VERIFY_CMD"
    if eval "$VERIFY_CMD" > "$LOG_DIR/verify-${ITERATION}.log" 2>&1; then
      echo "  Verification PASSED"
      # Update the story status in the PRD
      jq "(.stories[] | select(.id == \"$NEXT_STORY\") | .status) = \"completed\"" \
        "$PRD_FILE" > "${PRD_FILE}.tmp" && mv "${PRD_FILE}.tmp" "$PRD_FILE"
      echo "  Marked $NEXT_STORY as completed"
    else
      echo "  Verification FAILED -- will retry next iteration"
    fi
  else
    echo "  Claude reported an error, will retry"
  fi

  echo ""
done

# Final status report
COMPLETED=$(jq '[.stories[] | select(.status == "completed")] | length' "$PRD_FILE")
TOTAL=$(jq '.stories | length' "$PRD_FILE")
echo "=== RALF Loop Complete ==="
echo "Completed: $COMPLETED / $TOTAL stories"
echo "Iterations used: $ITERATION / $MAX_ITERATIONS"
echo "Logs: $LOG_DIR"

max_iterations as a Safety Guard

The MAX_ITERATIONS variable prevents runaway execution. Without it, a failing story could cause the loop to retry indefinitely. Rules of thumb:

Task Complexity	Suggested max_iterations
Simple transformations	5-10
Feature implementation	15-25
Full project build	30-50
Complex refactors	20-40

Always set this value. An infinite loop with API calls will drain your budget.

How Each Iteration Gets Fresh Context

Each call to claude -p starts a fresh conversation. This is intentional -- it prevents context window degradation. Long conversations cause Claude to lose track of earlier details, producing lower-quality output. By restarting each iteration:

Claude re-reads the PRD and sees updated statuses
Claude examines the actual codebase (not a stale memory of it)
Each iteration gets the full context window for its specific task

If you need continuity between iterations (rare), use --resume with the session ID from the previous run.

Verification Between Iterations

The verification step between iterations is what makes RALF reliable. After each implementation:

Run the project's test suite
Check that the specific acceptance criteria are met
Only mark the story as completed if verification passes
If verification fails, the next iteration will retry the same story

This is the difference between RALF and a naive loop that just sends prompts. RALF verifies actual outcomes, not just Claude's claim of completion.

5. Stop Hooks for Verification Gates

Stop hooks are the mechanism for preventing Claude from claiming it is done before the work actually passes quality checks. When Claude finishes responding, the Stop hook fires. If the hook returns a blocking decision, Claude continues working instead of stopping.

How Stop Hooks Work

The Stop hook fires when Claude finishes responding (but not on user interrupts). The hook receives context about the session including the last assistant message, and can:

Exit 0: Allow Claude to stop normally
Exit 2: Block the stop, send stderr feedback to Claude to continue working
Return JSON with decision: "block": Block with a reason sent to Claude

The stop_hook_active Guard

The hook input includes a stop_hook_active field that is true when Claude is already continuing because of a previous Stop hook. Always check this to prevent infinite loops:

#!/bin/bash
INPUT=$(cat)
STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active')

# Prevent infinite loop: only run the check once
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
  exit 0
fi

# Your verification logic here

Hook Type: Command (Shell Script)

The simplest verification gate -- a shell script that runs tests:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/verify-before-stop.sh",
            "timeout": 120,
            "statusMessage": "Running verification checks..."
          }
        ]
      }
    ]
  }
}

The verification script:

#!/bin/bash
# .claude/hooks/verify-before-stop.sh
INPUT=$(cat)
STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active')

# Prevent infinite loop
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
  exit 0
fi

# Run the test suite
echo "Running tests..." >&2
if ! npm test 2>&1; then
  echo "Tests are failing. Fix the failing tests before stopping." >&2
  exit 2  # Exit code 2 = block Claude from stopping
fi

# Run the linter
echo "Running linter..." >&2
if ! npm run lint 2>&1; then
  echo "Linting errors found. Fix them before stopping." >&2
  exit 2
fi

# Run type checking
echo "Running type check..." >&2
if ! npx tsc --noEmit 2>&1; then
  echo "Type errors found. Fix them before stopping." >&2
  exit 2
fi

# All checks passed
exit 0

Hook Type: Prompt (LLM Evaluation)

Use a prompt hook to have a fast model (Haiku by default) evaluate whether Claude should stop. This is useful for subjective quality checks:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "prompt",
            "prompt": "You are evaluating whether Claude should stop working. Context: $ARGUMENTS\n\nAnalyze the conversation and determine if:\n1. All user-requested tasks are complete\n2. Any errors remain unaddressed\n3. Tests have been run and pass\n4. Code follows the project conventions described in CLAUDE.md\n\nRespond with JSON: {\"ok\": true} to allow stopping, or {\"ok\": false, \"reason\": \"your explanation\"} to continue working.",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

The LLM returns {"ok": true} or {"ok": false, "reason": "..."}. If ok is false, Claude receives the reason as feedback and continues working.

Hook Type: Agent (Multi-Turn Verification)

Agent hooks are the most powerful option. They spawn a subagent that can read files, search code, and run commands to verify completion:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "agent",
            "prompt": "Verify that the implementation is complete. Check:\n1. All files mentioned in the task exist\n2. Unit tests exist and pass (run: npm test)\n3. No TODO comments remain in modified files\n4. No console.log statements in production code\n\nContext: $ARGUMENTS",
            "timeout": 120
          }
        ]
      }
    ]
  }
}

The agent can use Read, Grep, Glob, and other tools to investigate the codebase. It returns the same {"ok": true/false} decision format.

Combining Multiple Verification Gates

You can chain multiple hooks. All matching hooks run in parallel:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/run-tests.sh",
            "timeout": 120,
            "statusMessage": "Running test suite..."
          },
          {
            "type": "command",
            "command": ".claude/hooks/check-lint.sh",
            "timeout": 60,
            "statusMessage": "Checking code style..."
          },
          {
            "type": "prompt",
            "prompt": "Review whether all acceptance criteria from the original task are satisfied. Context: $ARGUMENTS",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

If any hook blocks, Claude continues working. The stderr or reason from the blocking hook tells Claude what to fix.

6. CI/CD Integration

Claude Code integrates directly into GitHub Actions and GitLab CI/CD pipelines. This is the most common production use case for autonomous execution.

GitHub Actions Setup

Quick Setup

The fastest way to set up GitHub Actions integration:

# Inside Claude Code interactive mode
/install-github-app

This guides you through installing the Claude GitHub App and configuring secrets.

Manual Setup

Install the Claude GitHub App: https://github.com/apps/claude
Add ANTHROPIC_API_KEY to your repository secrets
Create the workflow file

GitHub Actions Workflow: Respond to @claude Mentions

This is the core workflow. It triggers when someone mentions @claude in a PR or issue comment:

# .github/workflows/claude.yml
name: Claude Code
on:
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]

jobs:
  claude:
    if: contains(github.event.comment.body, '@claude')
    runs-on: ubuntu-latest
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

Usage in PR comments:

@claude implement this feature based on the issue description
@claude fix the TypeError in the user dashboard component
@claude review this PR for security issues

GitHub Actions Workflow: Automated PR Review

Automatically review every PR when it is opened or updated:

# .github/workflows/claude-review.yml
name: Claude PR Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: "/review"
          claude_args: "--max-turns 5"

GitHub Actions Workflow: Automated Issue Implementation

When an issue is labeled with claude-implement, Claude creates a PR with the implementation:

# .github/workflows/claude-implement.yml
name: Claude Auto-Implement
on:
  issues:
    types: [labeled]

jobs:
  implement:
    if: github.event.label.name == 'claude-implement'
    runs-on: ubuntu-latest
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: |
            Read the issue description and implement the requested feature.
            Create a new branch, implement the changes, and open a PR.
            Follow the project's CLAUDE.md guidelines.
          claude_args: |
            --max-turns 25
            --model claude-sonnet-4-6
            --allowedTools "Read,Write,Edit,Bash,Glob,Grep"

GitHub Actions Workflow: Daily Code Quality Report

Run a scheduled code quality analysis:

# .github/workflows/claude-quality.yml
name: Daily Code Quality
on:
  schedule:
    - cron: "0 9 * * 1-5"  # 9 AM weekdays

jobs:
  quality-report:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: |
            Analyze the codebase for quality issues:
            1. Find dead code and unused exports
            2. Identify overly complex functions (cyclomatic complexity)
            3. Check for missing error handling
            4. Look for potential performance issues
            5. Summarize findings as a GitHub issue
          claude_args: "--max-turns 10 --model sonnet"

GitHub Actions: Configuration Reference

The claude-code-action@v1 accepts these parameters:

Parameter	Description	Required
`anthropic_api_key`	Claude API key	Yes (unless Bedrock/Vertex)
`prompt`	Instructions for Claude	No
`claude_args`	CLI arguments passed to Claude	No
`github_token`	GitHub token for API access	No
`trigger_phrase`	Custom trigger phrase (default: `@claude`)	No
`use_bedrock`	Use AWS Bedrock instead of Claude API	No
`use_vertex`	Use Google Vertex AI instead of Claude API	No

Pass CLI arguments via claude_args:

claude_args: "--max-turns 5 --model claude-sonnet-4-6 --allowedTools 'Read,Edit,Bash'"

GitLab CI/CD Setup

GitLab CI/CD integration works similarly but uses .gitlab-ci.yml instead:

# .gitlab-ci.yml
stages:
  - ai

claude:
  stage: ai
  image: node:24-alpine3.21
  rules:
    - if: '$CI_PIPELINE_SOURCE == "web"'
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
  variables:
    GIT_STRATEGY: fetch
  before_script:
    - apk update
    - apk add --no-cache git curl bash
    - curl -fsSL https://claude.ai/install.sh | bash
  script:
    - >
      claude
      -p "${AI_FLOW_INPUT:-'Review this MR and suggest improvements'}"
      --permission-mode acceptEdits
      --allowedTools "Bash Read Edit Write"
      --debug

GitLab CI/CD: AWS Bedrock Integration

claude-bedrock:
  stage: ai
  image: node:24-alpine3.21
  rules:
    - if: '$CI_PIPELINE_SOURCE == "web"'
  before_script:
    - apk add --no-cache bash curl jq git python3 py3-pip
    - pip install --no-cache-dir awscli
    - curl -fsSL https://claude.ai/install.sh | bash
    - export AWS_WEB_IDENTITY_TOKEN_FILE="${CI_JOB_JWT_FILE:-/tmp/oidc_token}"
    - if [ -n "${CI_JOB_JWT_V2}" ]; then printf "%s" "$CI_JOB_JWT_V2" > "$AWS_WEB_IDENTITY_TOKEN_FILE"; fi
    - >
      aws sts assume-role-with-web-identity
      --role-arn "$AWS_ROLE_TO_ASSUME"
      --role-session-name "gitlab-claude-$(date +%s)"
      --web-identity-token "file://$AWS_WEB_IDENTITY_TOKEN_FILE"
      --duration-seconds 3600 > /tmp/aws_creds.json
    - export AWS_ACCESS_KEY_ID="$(jq -r .Credentials.AccessKeyId /tmp/aws_creds.json)"
    - export AWS_SECRET_ACCESS_KEY="$(jq -r .Credentials.SecretAccessKey /tmp/aws_creds.json)"
    - export AWS_SESSION_TOKEN="$(jq -r .Credentials.SessionToken /tmp/aws_creds.json)"
  script:
    - >
      claude
      -p "${AI_FLOW_INPUT:-'Implement the requested changes and open an MR'}"
      --permission-mode acceptEdits
      --allowedTools "Bash Read Edit Write"
      --debug
  variables:
    AWS_REGION: "us-west-2"

7. Log Analysis and Monitoring

One of the most practical autonomous uses of Claude is piping live logs through it for analysis.

Basic Log Piping

# Pipe last 100 lines for analysis
tail -100 /var/log/app.log | claude -p "Identify errors and anomalies in these logs"

# Live log monitoring
tail -f /var/log/app.log | claude -p "Watch for errors and report them as they appear"

Anomaly Detection Script

#!/bin/bash
# log-monitor.sh -- Monitor logs and alert on anomalies
set -euo pipefail

LOG_FILE="${1:-/var/log/app.log}"
CHECK_INTERVAL="${2:-300}"  # seconds between checks
ALERT_FILE="./alerts.log"

echo "Monitoring: $LOG_FILE (checking every ${CHECK_INTERVAL}s)"

while true; do
  # Get new log entries since last check
  NEW_LINES=$(tail -200 "$LOG_FILE")

  if [ -n "$NEW_LINES" ]; then
    ANALYSIS=$(echo "$NEW_LINES" | claude -p \
      "Analyze these log entries for anomalies. Look for: \
       1. Error patterns (stack traces, HTTP 5xx, timeout errors) \
       2. Performance degradation (slow queries, high latency) \
       3. Security concerns (auth failures, unusual access patterns) \
       4. Resource issues (memory warnings, disk space, connection pool) \
       \
       Output format: \
       - SEVERITY: CRITICAL/WARNING/INFO \
       - CATEGORY: error/performance/security/resource \
       - SUMMARY: one-line description \
       - DETAILS: relevant log entries \
       \
       If no anomalies found, output: NO_ANOMALIES" \
      --output-format json \
      --max-turns 2 2>/dev/null | jq -r '.result // "ERROR"')

    if [ "$ANALYSIS" != "NO_ANOMALIES" ] && [ "$ANALYSIS" != "ERROR" ]; then
      TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
      echo "[$TIMESTAMP] ALERT:" >> "$ALERT_FILE"
      echo "$ANALYSIS" >> "$ALERT_FILE"
      echo "---" >> "$ALERT_FILE"

      # Optionally send notification
      # curl -X POST "$SLACK_WEBHOOK" -d "{\"text\": \"Log Alert: $ANALYSIS\"}"
    fi
  fi

  sleep "$CHECK_INTERVAL"
done

Error Classification Pipeline

#!/bin/bash
# classify-errors.sh -- Classify errors from the last 24 hours
set -euo pipefail

# Extract errors from the last 24 hours
ERRORS=$(journalctl --since "24 hours ago" --priority=err --no-pager 2>/dev/null || \
         grep -i "error\|exception\|fatal" /var/log/app.log | tail -500)

if [ -z "$ERRORS" ]; then
  echo "No errors found in the last 24 hours."
  exit 0
fi

echo "$ERRORS" | claude -p \
  "Classify these errors into categories and provide a summary report. \
   For each category: \
   1. Name the category \
   2. Count occurrences \
   3. Identify the root cause if possible \
   4. Suggest a fix \
   5. Rate severity (critical/high/medium/low) \
   \
   Sort by severity (critical first)." \
  --output-format json \
  --max-turns 2 | jq -r '.result'

Deployment Log Watcher

#!/bin/bash
# watch-deploy.sh -- Monitor deployment and alert on issues
set -euo pipefail

DEPLOY_LOG="${1:-/var/log/deploy.log}"

echo "Watching deployment log: $DEPLOY_LOG"

tail -f "$DEPLOY_LOG" | while IFS= read -r line; do
  # Check for error patterns
  if echo "$line" | grep -qi "error\|fail\|crash\|fatal\|panic"; then
    # Send the error context to Claude for analysis
    CONTEXT=$(tail -20 "$DEPLOY_LOG")
    ANALYSIS=$(echo "$CONTEXT" | claude -p \
      "A deployment error occurred. Analyze the context and provide: \
       1. What went wrong \
       2. Is this a blocking error or recoverable? \
       3. Suggested immediate action" \
      --max-turns 2 \
      --output-format json 2>/dev/null | jq -r '.result // "Analysis failed"')

    echo ""
    echo "=== DEPLOYMENT ALERT ==="
    echo "Trigger: $line"
    echo "Analysis: $ANALYSIS"
    echo "========================"
  fi
done

8. Safety and Sandboxing

Autonomous execution requires strong safety boundaries. Claude Code provides multiple layers of protection.

Permission Modes

Claude Code has five permission modes, each offering a different level of autonomy:

Mode	Description	Use Case
`default`	Prompts for each tool use	Interactive development
`acceptEdits`	Auto-accepts file edits, prompts for bash	Semi-autonomous
`plan`	Read-only, no modifications	Analysis and planning
`dontAsk`	Auto-denies unless pre-approved	Constrained automation
`bypassPermissions`	Skips all prompts	Fully autonomous (containers only)

Set the mode via CLI:

# Accept edits automatically
claude -p "Refactor this module" --permission-mode acceptEdits

# Only allow pre-approved tools
claude -p "Run analysis" --permission-mode dontAsk --allowedTools "Read,Grep,Glob"

# Bypass all permissions (containers only!)
claude -p "Build the project" --permission-mode bypassPermissions

--dangerously-skip-permissions

This flag bypasses ALL permission checks. It is the same as --permission-mode bypassPermissions:

claude --dangerously-skip-permissions -p "Implement the feature"

When to use it:

Inside Docker containers with no network access
Inside VMs that will be destroyed after use
In CI/CD runners that are ephemeral
Never on your local machine with access to your files and network

When NOT to use it:

On your development machine
In any environment with internet access
In any environment with access to sensitive files
In any environment that persists after the run

--allowedTools for Precise Scoping

Instead of bypassing all permissions, scope exactly which tools Claude can use:

# Read-only analysis
claude -p "Analyze this codebase" \
  --allowedTools "Read,Grep,Glob"

# Edit with specific bash commands only
claude -p "Fix the tests" \
  --allowedTools "Read,Edit,Bash(npm test *),Bash(npx jest *)"

# Full git workflow but no arbitrary bash
claude -p "Create a commit" \
  --allowedTools "Read,Edit,Write,Bash(git *)"

Sandboxing with /sandbox

Claude Code's native sandboxing provides OS-level filesystem and network isolation:

# Inside Claude Code interactive mode
/sandbox

This opens a menu where you choose:

Auto-allow mode: Sandboxed commands run automatically; non-sandboxable commands use the normal permission flow
Regular permissions mode: All commands go through permission flow but are sandboxed

How Sandboxing Works

Filesystem isolation:

Write access restricted to the current working directory and subdirectories
Read access to the broader filesystem (with deny rules respected)
Cannot modify files outside the working directory

Network isolation:

Only approved domains can be accessed
New domain requests trigger permission prompts
All child processes inherit the same restrictions

OS-level enforcement:

macOS: Uses Seatbelt framework
Linux: Uses bubblewrap
WSL2: Uses bubblewrap

Docker Container Isolation

The safest pattern for fully autonomous execution is running Claude inside a Docker container:

# Dockerfile.claude-worker
FROM node:24-slim

# Install Claude Code
RUN npm install -g @anthropic-ai/claude-code

# Create workspace
WORKDIR /workspace

# Copy project files
COPY . .

# Install project dependencies
RUN npm install

# Run Claude with full permissions (safe inside container)
CMD ["claude", "-p", "--permission-mode", "bypassPermissions", \
     "--max-turns", "30", \
     "Implement the features defined in prd.json"]

The "Container Without Internet" Pattern

This is the gold standard for safe autonomous execution:

#!/bin/bash
# safe-autonomous.sh -- Run Claude in a network-isolated container
set -euo pipefail

PROJECT_DIR="$(pwd)"
TASK="${1:-Implement the features in prd.json}"

docker run --rm \
  --network none \
  -v "${PROJECT_DIR}:/workspace" \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
  -w /workspace \
  node:24-slim \
  bash -c "
    npm install -g @anthropic-ai/claude-code && \
    claude -p '${TASK}' \
      --permission-mode bypassPermissions \
      --max-turns 30 \
      --output-format json
  "

Wait -- --network none blocks API calls too. You need a more nuanced approach:

#!/bin/bash
# safe-autonomous-v2.sh -- Container with API-only network access
set -euo pipefail

# Create a Docker network that only allows Anthropic API access
docker network create --driver bridge claude-restricted 2>/dev/null || true

docker run --rm \
  --network claude-restricted \
  -v "$(pwd):/workspace" \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
  -w /workspace \
  node:24-slim \
  bash -c "
    npm install -g @anthropic-ai/claude-code && \
    claude -p 'Implement the features defined in prd.json' \
      --permission-mode bypassPermissions \
      --max-turns 30 \
      --output-format json
  "

For true network restriction with API access, use iptables rules or a proxy that only allows traffic to api.anthropic.com.

Network Restrictions via Sandbox Settings

Configure allowed domains in your settings file:

{
  "sandbox": {
    "network": {
      "allowedDomains": [
        "api.anthropic.com",
        "registry.npmjs.org",
        "github.com"
      ]
    }
  }
}

File System Restrictions

Use permission deny rules to protect sensitive areas:

{
  "permissions": {
    "deny": [
      "Read(~/.ssh/**)",
      "Read(~/.aws/**)",
      "Read(//.env)",
      "Edit(~/.bashrc)",
      "Edit(~/.zshrc)",
      "Bash(rm -rf *)",
      "Bash(curl *)",
      "Bash(wget *)"
    ]
  }
}

Sandbox Configuration Reference

{
  "sandbox": {
    "mode": "auto-allow",
    "network": {
      "httpProxyPort": 8080,
      "socksProxyPort": 8081,
      "allowedDomains": ["api.anthropic.com"]
    },
    "excludedCommands": ["docker", "watchman"],
    "allowUnsandboxedCommands": false,
    "allowUnixSockets": false
  }
}

Setting allowUnsandboxedCommands to false disables the escape hatch entirely -- all commands must run sandboxed or be in excludedCommands.

9. RALF vs GSD -- When to Use Which

RALF and GSD are two different patterns for autonomous Claude execution. Understanding when to use each is critical.

RALF: The Pure Executor

RALF works best when you have already defined exactly what needs to be done.

Characteristics:

Input is a structured PRD with user stories and acceptance criteria
Claude implements one story per iteration
Each iteration is independently verifiable
No planning phase -- Claude executes the plan you already wrote
Works with --max-turns per iteration for tight control

Best for:

Well-defined features with clear acceptance criteria
Batch implementations following a known pattern
Tasks where you have already made all the design decisions
Repeatable automation (run the same PRD on different projects)

GSD: The Planner-Executor

GSD (Get Stuff Done) is a pattern where Claude first plans the work, then executes it. It handles ambiguity better than RALF because it includes a scoping phase.

Characteristics:

Input is a high-level goal or problem description
Claude first creates a plan (in Plan Mode or a separate planning phase)
Then Claude executes the plan step by step
Includes self-correction and plan adjustment
Better for tasks where the path to completion is unclear

Best for:

Larger projects where you have not defined all the stories
Tasks requiring research before implementation
Ambiguous requirements that need scoping
One-off projects where writing a full PRD is overkill

Decision Matrix

Factor	Use RALF	Use GSD
Requirements clarity	Well-defined stories	Vague or high-level
Task scope	Small to medium	Medium to large
Repeatability	High (same PRD, different projects)	Low (one-off)
Design decisions	Already made	Need Claude to make them
Verification	Clear acceptance criteria	Subjective quality
Control	Maximum (per-iteration limits)	Moderate (plan-level)
Cost predictability	High (bounded iterations)	Lower (planning adds cost)

Can You Combine Them?

Yes. A common pattern is GSD for planning, RALF for execution:

#!/bin/bash
# Phase 1: GSD -- Claude creates the PRD
claude -p "Analyze this codebase and create a prd.json file for adding \
  user authentication. Include user stories with acceptance criteria. \
  Follow the format in prd-template.json." \
  --allowedTools "Read,Write,Grep,Glob" \
  --max-turns 15

# Phase 2: RALF -- Execute the PRD
./ralf.sh prd.json 20

This gives you the best of both worlds: Claude's planning ability for scoping, and RALF's structured execution for implementation.

10. Advanced Patterns

Multi-Stage Pipelines

Chain multiple Claude invocations in a sequence where each stage feeds the next:

#!/bin/bash
# multi-stage-pipeline.sh
set -euo pipefail

echo "=== Stage 1: Research ==="
claude -p "Analyze the codebase and identify all API endpoints. \
  Write a report to ./pipeline/api-inventory.md" \
  --allowedTools "Read,Write,Grep,Glob" \
  --max-turns 10

echo "=== Stage 2: Plan ==="
claude -p "Read ./pipeline/api-inventory.md. Design a comprehensive \
  test plan for all endpoints. Write the plan to ./pipeline/test-plan.md" \
  --allowedTools "Read,Write,Grep,Glob" \
  --max-turns 10

echo "=== Stage 3: Implement ==="
claude -p "Read ./pipeline/test-plan.md. Implement all the tests \
  described in the plan. Place tests in src/__tests__/api/" \
  --allowedTools "Read,Write,Edit,Bash(npm test *),Glob,Grep" \
  --max-turns 25

echo "=== Stage 4: Verify ==="
claude -p "Run the full test suite (npm test). If any tests fail, \
  fix them. Report final results." \
  --allowedTools "Read,Edit,Bash(npm test *),Bash(npx jest *),Glob,Grep" \
  --max-turns 15

echo "=== Stage 5: Report ==="
REPORT=$(claude -p "Read the test results and the code changes made. \
  Generate a summary report of what was tested and the results." \
  --allowedTools "Read,Grep,Glob" \
  --max-turns 5 \
  --output-format json | jq -r '.result')

echo "$REPORT" > ./pipeline/final-report.md
echo "Pipeline complete. Report: ./pipeline/final-report.md"

Watchdog Scripts

A watchdog monitors Claude's execution and restarts on failure:

#!/bin/bash
# watchdog.sh -- Restart Claude on failure
set -euo pipefail

TASK="${1:-Implement the features in prd.json}"
MAX_RETRIES=3
RETRY_DELAY=30

for attempt in $(seq 1 $MAX_RETRIES); do
  echo "Attempt $attempt / $MAX_RETRIES"

  claude -p "$TASK" \
    --allowedTools "Read,Write,Edit,Bash(npm *),Glob,Grep" \
    --max-turns 20 \
    --output-format json \
    > "./watchdog-attempt-${attempt}.json" 2>&1

  EXIT_CODE=$?
  IS_ERROR=$(jq -r '.is_error // false' "./watchdog-attempt-${attempt}.json" 2>/dev/null || echo "true")

  if [ "$EXIT_CODE" -eq 0 ] && [ "$IS_ERROR" = "false" ]; then
    echo "Success on attempt $attempt"
    exit 0
  fi

  echo "Attempt $attempt failed (exit: $EXIT_CODE, error: $IS_ERROR)"

  if [ $attempt -lt $MAX_RETRIES ]; then
    echo "Retrying in ${RETRY_DELAY}s..."
    sleep $RETRY_DELAY
  fi
done

echo "All $MAX_RETRIES attempts failed"
exit 1

Result Aggregation from Parallel Agents

#!/bin/bash
# parallel-review.sh -- Run multiple reviews in parallel, aggregate results
set -euo pipefail

RESULTS_DIR="./review-results"
mkdir -p "$RESULTS_DIR"

# Launch parallel reviews
claude -p "Review this codebase for security vulnerabilities. \
  Focus on auth, input validation, and data exposure." \
  --allowedTools "Read,Grep,Glob" \
  --max-turns 10 \
  --output-format json \
  > "$RESULTS_DIR/security.json" &
PID_SECURITY=$!

claude -p "Review this codebase for performance issues. \
  Focus on N+1 queries, memory leaks, and bundle size." \
  --allowedTools "Read,Grep,Glob" \
  --max-turns 10 \
  --output-format json \
  > "$RESULTS_DIR/performance.json" &
PID_PERF=$!

claude -p "Review this codebase for code quality issues. \
  Focus on complexity, duplication, and naming." \
  --allowedTools "Read,Grep,Glob" \
  --max-turns 10 \
  --output-format json \
  > "$RESULTS_DIR/quality.json" &
PID_QUALITY=$!

# Wait for all to complete
wait $PID_SECURITY $PID_PERF $PID_QUALITY

# Aggregate results
SECURITY=$(jq -r '.result' "$RESULTS_DIR/security.json")
PERFORMANCE=$(jq -r '.result' "$RESULTS_DIR/performance.json")
QUALITY=$(jq -r '.result' "$RESULTS_DIR/quality.json")

# Feed aggregated results to a synthesizer
echo "Security Review:
$SECURITY

Performance Review:
$PERFORMANCE

Code Quality Review:
$QUALITY" | claude -p "Synthesize these three code reviews into a single \
  prioritized report. Group findings by severity (Critical, High, Medium, Low). \
  Deduplicate any overlapping findings." \
  --max-turns 3 \
  --output-format json | jq -r '.result' > "./review-results/final-report.md"

echo "Combined report: ./review-results/final-report.md"

Scheduled Claude Runs with Cron

# Add to crontab with: crontab -e

# Daily code quality check at 6 AM
0 6 * * * cd /path/to/project && /usr/local/bin/claude -p "Run a code quality analysis and write results to ./reports/quality-$(date +\%Y\%m\%d).md" --allowedTools "Read,Write,Grep,Glob" --max-turns 10 >> /var/log/claude-cron.log 2>&1

# Weekly dependency audit on Mondays at 8 AM
0 8 * * 1 cd /path/to/project && /usr/local/bin/claude -p "Audit dependencies for security vulnerabilities and outdated packages. Write report to ./reports/deps-$(date +\%Y\%m\%d).md" --allowedTools "Read,Write,Bash(npm audit *),Bash(npx *),Grep,Glob" --max-turns 10 >> /var/log/claude-cron.log 2>&1

Using Agent Teams in CI/CD

For complex CI/CD tasks, you can use agent teams where multiple Claude instances collaborate:

#!/bin/bash
# ci-agent-team.sh -- Multiple Claude instances working on a PR
set -euo pipefail

PR_DIFF=$(gh pr diff "$1")

# Security review agent
echo "$PR_DIFF" | claude -p "You are a security reviewer. Analyze this PR diff for vulnerabilities." \
  --allowedTools "Read,Grep,Glob" \
  --max-turns 5 \
  --output-format json > /tmp/security-review.json &

# Performance review agent
echo "$PR_DIFF" | claude -p "You are a performance reviewer. Analyze this PR diff for performance issues." \
  --allowedTools "Read,Grep,Glob" \
  --max-turns 5 \
  --output-format json > /tmp/perf-review.json &

# Test coverage agent
echo "$PR_DIFF" | claude -p "You are a test coverage analyst. Check if new code has adequate tests." \
  --allowedTools "Read,Grep,Glob" \
  --max-turns 5 \
  --output-format json > /tmp/test-review.json &

wait

# Aggregate and post comment
COMBINED=$(cat /tmp/security-review.json /tmp/perf-review.json /tmp/test-review.json | \
  jq -s '[.[].result] | join("\n---\n")')

gh pr comment "$1" --body "## Automated Review\n\n${COMBINED}"

11. The Claude Agent SDK (Programmatic Access)

The Agent SDK lets you use Claude Code as a library in your TypeScript or Python applications. It provides the same tools, agent loop, and context management as the CLI, but with programmatic control.

Installation

# TypeScript
npm install @anthropic-ai/claude-agent-sdk

# Python
pip install claude-agent-sdk

TypeScript: Basic Usage

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Find and fix the bug in auth.py",
  options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
  if ("result" in message) {
    console.log(message.result);
  }
}

Python: Basic Usage

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions


async def main():
    async for message in query(
        prompt="Find and fix the bug in auth.py",
        options=ClaudeAgentOptions(allowed_tools=["Read", "Edit", "Bash"]),
    ):
        if hasattr(message, "result"):
            print(message.result)


asyncio.run(main())

Spawning Sessions and Resuming

import { query } from "@anthropic-ai/claude-agent-sdk";

let sessionId: string | undefined;

// First query: capture the session ID
for await (const message of query({
  prompt: "Read the authentication module",
  options: { allowedTools: ["Read", "Glob"] }
})) {
  if (message.type === "system" && message.subtype === "init") {
    sessionId = message.session_id;
  }
}

// Resume with full context from the first query
for await (const message of query({
  prompt: "Now find all places that call it",
  options: { resume: sessionId }
})) {
  if ("result" in message) console.log(message.result);
}

Custom Subagents via SDK

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Use the code-reviewer agent to review this codebase",
  options: {
    allowedTools: ["Read", "Glob", "Grep", "Task"],
    agents: {
      "code-reviewer": {
        description: "Expert code reviewer for quality and security reviews.",
        prompt: "Analyze code quality and suggest improvements.",
        tools: ["Read", "Glob", "Grep"]
      }
    }
  }
})) {
  if ("result" in message) console.log(message.result);
}

SDK with Hooks

import asyncio
from datetime import datetime
from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher


async def log_file_change(input_data, tool_use_id, context):
    file_path = input_data.get("tool_input", {}).get("file_path", "unknown")
    with open("./audit.log", "a") as f:
        f.write(f"{datetime.now()}: modified {file_path}\n")
    return {}


async def main():
    async for message in query(
        prompt="Refactor utils.py to improve readability",
        options=ClaudeAgentOptions(
            permission_mode="acceptEdits",
            hooks={
                "PostToolUse": [
                    HookMatcher(matcher="Edit|Write", hooks=[log_file_change])
                ]
            },
        ),
    ):
        if hasattr(message, "result"):
            print(message.result)


asyncio.run(main())

SDK with MCP Servers

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Open example.com and describe what you see",
  options: {
    mcpServers: {
      playwright: { command: "npx", args: ["@playwright/mcp@latest"] }
    }
  }
})) {
  if ("result" in message) console.log(message.result);
}

When to Use the SDK vs CLI

Use Case	Best Choice
Shell scripts and CI/CD	CLI (`claude -p`)
Custom applications	SDK
One-off automation	CLI
Production services	SDK
Prototyping pipelines	CLI
Building agents that spawn agents	SDK

12. Exercises

Exercise 1: Basic Headless Pipeline

Write a bash script that:

Takes a directory path as an argument
Uses claude -p with --output-format json to analyze all .ts files for unused imports
Collects results into a single JSON report file
Prints a summary of total files scanned and issues found

Stretch goal: Use --json-schema to enforce a structured output format.

Exercise 2: Batch File Transformer

Write a batch processing script that:

Finds all JavaScript files in a project
Processes each file with Claude to convert require() statements to import syntax
Runs in parallel (4 at a time) using xargs -P
Handles errors gracefully (logs failures, continues processing)
Generates a summary report

Stretch goal: Add a --dry-run flag that uses --permission-mode plan to preview changes without modifying files.

Exercise 3: Stop Hook Verification Gate

Create a Stop hook configuration that:

Runs the project test suite
Checks that no TODO comments remain in modified files
Verifies that all new functions have JSDoc comments
Uses the stop_hook_active guard to prevent infinite loops

Test it by starting Claude with a task that intentionally leaves TODOs, and verify that the hook catches them.

Exercise 4: RALF Loop Implementation

Create a prd.json with 3 user stories for a simple feature (e.g., a REST API for a todo list)
Implement the RALF loop script from Section 4
Run it and observe how it progresses through stories
Add a verification step that checks test results between iterations
Observe what happens when a story fails verification

Exercise 5: GitHub Actions PR Review

Create a GitHub Actions workflow that:

Triggers on PR open and synchronize events
Uses claude-code-action@v1 to review the PR
Posts a review comment with security, performance, and quality findings
Limits to 5 max turns to control cost
Uses a custom system prompt for your project's specific review criteria

Exercise 6: Multi-Stage Pipeline

Build a 4-stage pipeline:

Audit: Scan for security vulnerabilities in dependencies
Analyze: Identify code quality issues
Fix: Automatically fix the safe-to-fix issues
Report: Generate a comprehensive report of changes made

Each stage should pass context to the next via files in a ./pipeline/ directory. The final report should include cost information from the JSON output of each stage.

Exercise 7: Log Monitor

Create a log monitoring script that:

Tails a log file (create a fake one for testing)
Every 60 seconds, sends new entries to Claude for analysis
Classifies entries as normal, warning, or critical
Writes alerts to a separate file
Includes a mechanism to prevent duplicate alerts for the same issue

13. Pro Tips from Boris Cherny

Boris Cherny, an engineer at Anthropic who works on Claude Code, has shared several practices for autonomous execution:

Use --permission-mode dontAsk in Sandboxes

The dontAsk mode auto-denies any tool that is not explicitly pre-approved. This is safer than bypassPermissions because you define exactly what is allowed:

claude -p "Implement the feature" \
  --permission-mode dontAsk \
  --allowedTools "Read,Write,Edit,Bash(npm test *),Bash(npx jest *),Grep,Glob"

If Claude tries to use a tool not in --allowedTools, it is automatically denied without prompting. This prevents unexpected behavior while still allowing Claude to work autonomously with the tools it needs.

Background Verification Agents

Run a separate Claude instance that periodically checks the work of the primary instance:

# Main worker
claude -p "Implement the auth system" \
  --allowedTools "Read,Write,Edit,Bash(npm *),Grep,Glob" \
  --max-turns 30 &
WORKER_PID=$!

# Background verifier (runs every 2 minutes)
while kill -0 $WORKER_PID 2>/dev/null; do
  sleep 120
  claude -p "Check the current state of the codebase. \
    Run npm test and npm run lint. \
    Report any issues found." \
    --allowedTools "Read,Bash(npm test *),Bash(npm run lint *),Grep,Glob" \
    --max-turns 5 \
    --output-format json | jq -r '.result' >> ./verification-log.txt
done

wait $WORKER_PID
echo "Worker finished. Verification log: ./verification-log.txt"

"Give Claude a Way to Verify Its Work"

The single most important practice for autonomous execution: always give Claude the tools and commands to verify what it has done. If Claude can run tests, it will run tests. If it cannot, it will guess whether the code works.

# Bad: Claude cannot verify its work
claude -p "Add authentication" --allowedTools "Read,Write,Edit"

# Good: Claude can run tests to verify
claude -p "Add authentication" \
  --allowedTools "Read,Write,Edit,Bash(npm test *),Bash(npx tsc --noEmit)"

Chrome Extension for Browser Testing

For projects with browser-based UIs, give Claude access to a browser for visual verification:

claude --chrome -p "Implement the login page and verify it renders correctly"

The --chrome flag enables browser automation, allowing Claude to visually verify UI changes.

14. Anti-Patterns

Anti-Pattern 1: No max_iterations / No --max-turns

Problem: Running a RALF loop or headless command without any iteration limit.

Consequence: If Claude gets stuck on a failing test or an impossible task, it will loop indefinitely, burning through your API budget.

Fix: Always set --max-turns in headless mode and MAX_ITERATIONS in RALF loops.

# Bad
claude -p "Fix all the bugs"

# Good
claude -p "Fix all the bugs" --max-turns 15

Anti-Pattern 2: --dangerously-skip-permissions Outside Containers

Problem: Using --dangerously-skip-permissions on your local machine.

Consequence: Claude has unrestricted access to your entire filesystem and network. A prompt injection attack or a simple misunderstanding could delete files, exfiltrate data, or modify system configuration.

Fix: Only use --dangerously-skip-permissions inside ephemeral containers or VMs. On your local machine, use --allowedTools to scope permissions precisely.

Anti-Pattern 3: Not Verifying Between Iterations

Problem: Running a RALF loop that marks stories as complete based on Claude's claim, without running actual tests.

Consequence: Broken code accumulates. By the time you discover the issues, 10 stories have been "completed" with cascading failures.

Fix: Always run the verification command between iterations and only mark stories complete when verification passes.

Anti-Pattern 4: Overly Broad --allowedTools

Problem: Using --allowedTools "Bash" which allows any bash command.

Consequence: Claude can run rm -rf, curl to external servers, or modify system files. No guardrails.

Fix: Scope bash permissions with prefix matching:

# Bad
--allowedTools "Bash"

# Good
--allowedTools "Bash(npm test *),Bash(npm run *),Bash(git diff *),Read,Edit"

Anti-Pattern 5: No Budget Limit

Problem: Running autonomous pipelines without cost controls.

Consequence: An overnight pipeline could consume hundreds of dollars if it enters a retry loop.

Fix: Use --max-budget-usd to cap spending:

claude -p "Run the analysis" --max-budget-usd 5.00 --max-turns 20

Anti-Pattern 6: Ignoring Exit Codes

Problem: Not checking the exit code from claude -p in scripts.

Consequence: A failed Claude run is treated as a success. Downstream steps execute on broken state.

Fix: Always check exit codes:

if ! claude -p "Run tests" --max-turns 10; then
  echo "Claude execution failed"
  exit 1
fi

Anti-Pattern 7: No Logging

Problem: Running autonomous pipelines without saving Claude's output.

Consequence: When something goes wrong, you have no way to diagnose what happened.

Fix: Always redirect output to log files:

claude -p "Implement feature" \
  --output-format json \
  --verbose \
  > "./logs/run-$(date +%Y%m%d-%H%M%S).json" 2>&1

Anti-Pattern 8: Single Massive Prompt

Problem: Putting an entire project specification into a single prompt and hoping Claude executes it all in one go.

Consequence: Claude loses track of requirements as the context window fills with code it has written. Quality degrades sharply.

Fix: Break the work into stages (multi-stage pipeline) or stories (RALF loop). Each invocation gets a focused task with fresh context.

15. Official Documentation Links

Core References

CLI Reference -- All CLI flags, options, and usage
Headless Mode / Agent SDK CLI -- Programmatic usage via -p
Sandboxing -- Filesystem and network isolation
Permissions -- Permission modes and rule syntax
Hooks Reference -- Hook events, matchers, exit codes
Hooks Guide -- Practical hook examples

CI/CD Integration

GitHub Actions -- GitHub Actions setup and workflows
GitLab CI/CD -- GitLab pipeline integration
Claude Code Action Repository -- The official GitHub Action

Agent SDK

Agent SDK Overview -- SDK capabilities and examples
TypeScript SDK -- TypeScript API reference
Python SDK -- Python API reference
Streaming Output -- Real-time streaming
Structured Outputs -- JSON Schema validation

Related Guides

Settings Reference -- All configuration options
Sub-Agents -- Creating specialized agents
Common Workflows -- Workflow patterns
Best Practices -- Planning, prompting, verification
Memory (CLAUDE.md) -- Project context configuration

Sandbox Runtime (Open Source)

The sandbox runtime is available as an open source npm package:

npx @anthropic-ai/sandbox-runtime <command-to-sandbox>

Source: github.com/anthropic-experimental/sandbox-runtime

Community Resources

How Boris Uses Claude Code -- Boris Cherny's workflow
Agent SDK Demo Repository -- Example agents

Last Updated: 2026-02-20 Compiled from official Anthropic documentation, Claude Agent SDK docs, and community best practices