Level 7: Fully Autonomous Pipelines
The capstone level. Set it up, walk away, come back to finished work. This guide covers headless mode, batch processing, autonomous execution loops, CI/CD integration, safety sandboxing, and the Agent SDK for building production-grade pipelines with Claude Code.
Table of Contents
- Overview and Goal
- Headless Mode -- Claude Without a Terminal
- Batch Processing and Fan-Out
- The RALF Loop (Autonomous Execution)
- Stop Hooks for Verification Gates
- CI/CD Integration
- Log Analysis and Monitoring
- Safety and Sandboxing
- RALF vs GSD -- When to Use Which
- Advanced Patterns
- The Claude Agent SDK (Programmatic Access)
- Exercises
- Pro Tips from Boris Cherny
- Anti-Patterns
- Official Documentation Links
1. Overview and Goal
Everything up to this point has been interactive. You type, Claude responds. You approve, Claude acts. Level 7 removes you from the loop.
Autonomous pipelines are workflows where Claude Code executes without human intervention from start to finish. You define the task, the boundaries, the verification criteria, and the safety constraints. Then you walk away. When you come back, the work is done -- or it stopped safely at a verification gate that needs your attention.
What "Autonomous" Means in Practice
| Level | Description | Human Involvement |
|---|---|---|
| Interactive | You type every prompt | Every turn |
| Semi-autonomous | You approve each tool use | Frequent |
| Permission-scoped | You pre-approve certain tools | Occasional |
| Fully autonomous | Claude runs start to finish | None until completion |
The Building Blocks
Autonomous pipelines are built from these primitives:
- Headless mode (
-pflag) -- Run Claude without a terminal UI - Permission scoping (
--allowedTools,--permission-mode) -- Define what Claude can do - Output formatting (
--output-format json) -- Parse results programmatically - Iteration control (
--max-turns) -- Prevent runaway execution - Verification hooks (Stop hooks) -- Gate completion on quality checks
- Sandboxing (
/sandbox, containers) -- Enforce filesystem and network boundaries
When to Go Autonomous
- Batch processing hundreds of files with the same transformation
- CI/CD pipelines that run on every PR
- Overnight migrations or refactors
- Scheduled code quality sweeps
- Log monitoring and alerting
- Multi-stage build-test-deploy pipelines
When NOT to Go Autonomous
- Exploratory or ambiguous tasks
- Tasks requiring creative judgment calls
- Anything involving production databases without a rollback plan
- First-time runs of a new pipeline (always test interactively first)
2. Headless Mode -- Claude Without a Terminal
Headless mode is the foundation of every autonomous pipeline. The -p (or --print) flag tells Claude Code to accept a prompt, execute it, print the result, and exit -- no interactive UI, no permission dialogs blocking execution.
The -p Flag
# Basic headless execution
claude -p "What files are in this project?"
# With tool permissions pre-approved
claude -p "Run the test suite and report failures" \
--allowedTools "Bash(npm test *),Read"
# With structured output
claude -p "Summarize this project" --output-format json
The -p flag changes Claude Code from an interactive REPL to a one-shot command-line tool. It processes the prompt, uses whatever tools are needed (subject to permissions), and writes the result to stdout.
Input Methods
There are three ways to feed input to headless Claude:
Direct Prompt
claude -p "Explain the authentication flow in this codebase"
Piped stdin
# Pipe file contents
cat src/auth.py | claude -p "Review this code for security issues"
# Pipe command output
git diff HEAD~5 | claude -p "Summarize these changes"
# Pipe log output
tail -100 /var/log/app.log | claude -p "Identify any errors or anomalies"
File-Based Prompts
# Using system prompt from a file
claude -p "Review the code" --system-prompt-file ./prompts/security-review.txt
# Appending instructions from a file while keeping defaults
claude -p "Review this PR" --append-system-prompt-file ./prompts/style-rules.txt
Output Formats
Claude Code supports three output formats in headless mode. Each serves a different use case.
Text Output (Default)
Plain text, suitable for human reading or simple piping.
claude -p "What does the main function do?"
# Output: The main function initializes the application...
JSON Output
Structured JSON with metadata. The result text is in the result field.
claude -p "Summarize this project" --output-format json
Output structure:
{
"type": "result",
"subtype": "success",
"cost_usd": 0.003,
"is_error": false,
"duration_ms": 4521,
"duration_api_ms": 3200,
"num_turns": 2,
"result": "This project is a REST API built with Express.js...",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"total_cost_usd": 0.003
}
Extract the result with jq:
claude -p "Summarize this project" --output-format json | jq -r '.result'
JSON Schema Output
Get validated structured output conforming to a specific schema:
claude -p "Extract the main function names from auth.py" \
--output-format json \
--json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"string"}}},"required":["functions"]}'
The structured data appears in the structured_output field:
claude -p "Extract function names" \
--output-format json \
--json-schema '...' \
| jq '.structured_output'
Stream JSON Output
Newline-delimited JSON for real-time streaming. Each line is a separate event.
claude -p "Explain recursion" \
--output-format stream-json \
--verbose \
--include-partial-messages
Filter for just the streaming text:
claude -p "Write a poem" \
--output-format stream-json \
--verbose \
--include-partial-messages | \
jq -rj 'select(.type == "stream_event" and .event.delta.type? == "text_delta") | .event.delta.text'
Exit Codes
Claude Code returns meaningful exit codes in headless mode:
| Exit Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (general failure) |
| 2 | Max turns reached (when using --max-turns) |
Use exit codes in scripts for conditional logic:
claude -p "Run tests and fix failures" \
--allowedTools "Bash,Read,Edit" \
--max-turns 10
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "All tasks completed successfully"
elif [ $EXIT_CODE -eq 2 ]; then
echo "Hit max turns limit -- may need more iterations"
else
echo "Error occurred"
fi
Environment Variables for Headless Operation
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY |
API key for authentication |
CLAUDE_CODE_USE_BEDROCK |
Set to 1 to use AWS Bedrock |
CLAUDE_CODE_USE_VERTEX |
Set to 1 to use Google Vertex AI |
Complete CLI Reference for Headless Flags
These flags are the most relevant for headless/autonomous operation:
| Flag | Description |
|---|---|
-p, --print |
Run in headless mode (required) |
--output-format |
text, json, or stream-json |
--json-schema |
JSON Schema for structured output |
--input-format |
Input format: text or stream-json |
--include-partial-messages |
Include streaming events (requires stream-json) |
--allowedTools |
Tools that execute without permission prompts |
--disallowedTools |
Tools removed from the model entirely |
--tools |
Restrict which tools are available ("Bash,Edit,Read") |
--permission-mode |
default, acceptEdits, plan, dontAsk, bypassPermissions |
--dangerously-skip-permissions |
Skip ALL permission prompts |
--max-turns |
Limit agentic turns (exits with code 2 when reached) |
--max-budget-usd |
Maximum dollar spend before stopping |
--model |
Model selection: sonnet, opus, or full model ID |
--fallback-model |
Fallback when primary model is overloaded |
--system-prompt |
Replace entire system prompt |
--system-prompt-file |
Replace system prompt from file |
--append-system-prompt |
Append to default system prompt |
--append-system-prompt-file |
Append from file to system prompt |
--continue, -c |
Continue most recent conversation |
--resume, -r |
Resume specific session by ID |
--no-session-persistence |
Do not save session to disk |
--verbose |
Show full turn-by-turn output |
--debug |
Enable debug logging |
--mcp-config |
Load MCP servers from JSON config |
Continuing Conversations Programmatically
# First run
claude -p "Review this codebase for performance issues" --output-format json > first_pass.json
# Extract session ID
SESSION_ID=$(jq -r '.session_id' first_pass.json)
# Continue the same conversation
claude -p "Now focus on the database queries" --resume "$SESSION_ID"
# Or just continue the most recent conversation
claude -p "Generate a summary of all issues found" --continue
3. Batch Processing and Fan-Out
Batch processing is where headless mode pays off. Instead of running Claude once, you run it across hundreds of files, each invocation independent and parallelizable.
The Basic Pattern
# Process each file in a loop
for file in src/**/*.ts; do
claude -p "Add JSDoc comments to all exported functions in this file" \
--allowedTools "Read,Edit" \
< "$file"
done
Parallel Execution with xargs
# Process 4 files at a time in parallel
find src -name "*.ts" | xargs -P 4 -I {} \
claude -p "Add JSDoc comments to all exported functions in {}" \
--allowedTools "Read,Edit"
Parallel Execution with GNU parallel
# Process with GNU parallel (better job control)
find src -name "*.ts" | parallel -j 4 \
claude -p "Add JSDoc comments to all exported functions in {}" \
--allowedTools "Read,Edit"
Scoped Permissions with --allowedTools
The --allowedTools flag is critical for batch scripts. It defines exactly which tools Claude can use without prompting. This follows the permission rule syntax:
# Read-only analysis (safest)
claude -p "Review this code" --allowedTools "Read,Grep,Glob"
# Read and edit (for transformations)
claude -p "Add types" --allowedTools "Read,Edit"
# With specific bash commands
claude -p "Run tests" --allowedTools "Bash(npm test *),Read"
# Wildcard bash with prefix matching
claude -p "Git operations" --allowedTools "Bash(git diff *),Bash(git log *),Bash(git status *)"
The space before * matters. Bash(git diff *) matches git diff HEAD but not git diff-index. Without the space, Bash(git diff*) would match both.
Error Handling in Batch Scripts
#!/bin/bash
set -euo pipefail
RESULTS_DIR="./batch-results"
mkdir -p "$RESULTS_DIR"
FAILED=0
SUCCEEDED=0
TOTAL=0
for file in src/components/*.tsx; do
TOTAL=$((TOTAL + 1))
BASENAME=$(basename "$file" .tsx)
echo "Processing: $file"
if claude -p "Add comprehensive prop type definitions to this React component. \
Read the file at $file, add TypeScript interfaces for all props, \
and ensure all props are properly typed." \
--allowedTools "Read,Edit" \
--max-turns 5 \
--output-format json \
> "$RESULTS_DIR/${BASENAME}.json" 2>&1; then
SUCCEEDED=$((SUCCEEDED + 1))
echo " OK: $file"
else
FAILED=$((FAILED + 1))
echo " FAIL: $file (exit code: $?)"
fi
done
echo ""
echo "=== Batch Results ==="
echo "Total: $TOTAL"
echo "Succeeded: $SUCCEEDED"
echo "Failed: $FAILED"
Collecting and Aggregating Results
#!/bin/bash
# Collect results from multiple Claude runs into a single report
REPORT_FILE="./batch-report.md"
echo "# Batch Processing Report" > "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "Generated: $(date)" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
for result_file in ./batch-results/*.json; do
FILENAME=$(basename "$result_file" .json)
RESULT=$(jq -r '.result // "No result"' "$result_file")
COST=$(jq -r '.cost_usd // "unknown"' "$result_file")
IS_ERROR=$(jq -r '.is_error // false' "$result_file")
echo "## $FILENAME" >> "$REPORT_FILE"
echo "- Cost: \$${COST}" >> "$REPORT_FILE"
echo "- Error: ${IS_ERROR}" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "$RESULT" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "---" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
done
echo "Report written to $REPORT_FILE"
Example 1: File Migration (React Class Components to Hooks)
#!/bin/bash
# migrate-to-hooks.sh
# Migrate React class components to functional components with hooks
set -euo pipefail
MIGRATION_PROMPT="Read this file. If it contains a React class component, \
convert it to a functional component using hooks. Preserve all functionality: \
- Convert state to useState \
- Convert lifecycle methods to useEffect \
- Convert class methods to regular functions or useCallback \
- Preserve all props and their types \
- Keep all existing tests passing \
If the file is already a functional component, make no changes."
LOG_FILE="./migration-log.txt"
echo "Migration started: $(date)" > "$LOG_FILE"
find src/components -name "*.tsx" -o -name "*.jsx" | while read -r file; do
echo "Migrating: $file"
claude -p "$MIGRATION_PROMPT" \
--allowedTools "Read,Edit" \
--append-system-prompt "You are migrating the file at: $file" \
--max-turns 8 \
--output-format json \
2>&1 | tee -a "$LOG_FILE" | jq -r '.result // "ERROR"' || true
echo "---" >> "$LOG_FILE"
done
echo "Migration completed: $(date)" >> "$LOG_FILE"
Example 2: Adding License Headers to Files
#!/bin/bash
# add-license-headers.sh
# Add license headers to all source files missing them
set -euo pipefail
LICENSE_HEADER="Copyright (c) 2026 Acme Corp. All rights reserved.
Licensed under the MIT License."
find src -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" | while read -r file; do
# Check if file already has a license header
if head -5 "$file" | grep -q "Copyright"; then
echo "SKIP (already has header): $file"
continue
fi
echo "Adding header to: $file"
claude -p "Add this license header as a comment block at the very top of \
the file at $file (before any imports). Use the appropriate comment \
syntax for the file type. The license text is: ${LICENSE_HEADER}" \
--allowedTools "Read,Edit" \
--max-turns 3 \
--output-format json | jq -r '.is_error' || true
done
Example 3: Generating Tests for Each Module
#!/bin/bash
# generate-tests.sh
# Generate unit tests for every module that lacks them
set -euo pipefail
RESULTS_DIR="./test-generation-results"
mkdir -p "$RESULTS_DIR"
TEST_PROMPT="Read the source file and generate comprehensive unit tests for it. \
Requirements: \
- Use the existing test framework (Jest/Vitest) found in the project \
- Cover all exported functions and classes \
- Include edge cases and error conditions \
- Follow existing test patterns in the project \
- Place the test file next to the source file as __tests__/<filename>.test.ts \
- Run the tests to verify they pass"
find src -name "*.ts" ! -name "*.test.ts" ! -name "*.spec.ts" ! -path "*__tests__*" | while read -r file; do
BASENAME=$(basename "$file" .ts)
TEST_FILE="$(dirname "$file")/__tests__/${BASENAME}.test.ts"
# Skip if test already exists
if [ -f "$TEST_FILE" ]; then
echo "SKIP (test exists): $file"
continue
fi
echo "Generating tests for: $file"
claude -p "$TEST_PROMPT for the file at $file" \
--allowedTools "Read,Write,Edit,Bash(npx jest *),Bash(npm test *),Glob,Grep" \
--max-turns 15 \
--output-format json \
> "$RESULTS_DIR/${BASENAME}.json" 2>&1
IS_ERROR=$(jq -r '.is_error' "$RESULTS_DIR/${BASENAME}.json")
if [ "$IS_ERROR" = "true" ]; then
echo " FAIL: $file"
else
echo " OK: $file"
fi
done
Example 4: Code Review of Multiple PRs
#!/bin/bash
# review-prs.sh
# Review all open PRs in a repository
set -euo pipefail
REVIEW_PROMPT="You are a senior code reviewer. Review the PR diff provided via stdin. \
Focus on: \
1. Security vulnerabilities (SQL injection, XSS, auth bypass) \
2. Performance issues (N+1 queries, memory leaks, unnecessary re-renders) \
3. Code quality (naming, complexity, duplication) \
4. Test coverage (are new features tested?) \
5. Breaking changes (API contract, database schema) \
Output a structured review with severity levels: CRITICAL, WARNING, INFO."
# Get list of open PRs
PR_NUMBERS=$(gh pr list --state open --json number --jq '.[].number')
for pr in $PR_NUMBERS; do
echo "Reviewing PR #${pr}..."
gh pr diff "$pr" | claude -p "$REVIEW_PROMPT" \
--append-system-prompt "You are reviewing PR #${pr}." \
--output-format json \
--max-turns 3 \
> "./reviews/pr-${pr}-review.json" 2>&1
RESULT=$(jq -r '.result' "./reviews/pr-${pr}-review.json")
echo "PR #${pr} Review:"
echo "$RESULT"
echo "---"
done
4. The RALF Loop (Autonomous Execution)
RALF stands for Read-Act-Loop-Finish. It is a pattern for autonomous, multi-iteration execution where Claude reads a specification, works through it story by story, verifies after each step, and loops until all work is done or a safety limit is hit.
What RALF Is
RALF is not a built-in Claude Code feature. It is a scripting pattern that wraps Claude Code's headless mode in a loop with:
- Read: Claude reads a structured specification (the
prd.json) - Act: Claude implements one user story or task
- Loop: The script checks progress and sends Claude back for the next task
- Finish: All acceptance criteria are met, or
max_iterationsis reached
The key insight is that each iteration starts a fresh Claude context (or continues a session), preventing the context window from degrading over long-running tasks.
The prd.json Format
The PRD (Product Requirements Document) file defines what Claude should build. It uses a structured format with user stories and acceptance criteria so progress can be verified programmatically.
{
"project": "User Authentication System",
"description": "Implement a complete authentication system with login, signup, and session management",
"tech_stack": {
"language": "TypeScript",
"framework": "Express.js",
"database": "PostgreSQL with Prisma ORM",
"testing": "Jest"
},
"stories": [
{
"id": "AUTH-001",
"title": "User Registration",
"description": "As a new user, I want to create an account so I can access the application",
"acceptance_criteria": [
"POST /api/auth/register endpoint accepts email and password",
"Password is hashed with bcrypt before storage",
"Email uniqueness is enforced at the database level",
"Returns 201 with user object (without password) on success",
"Returns 409 if email already exists",
"Input validation rejects invalid email formats",
"Unit tests pass for all success and error cases"
],
"files": ["src/routes/auth.ts", "src/models/user.ts", "src/middleware/validation.ts"],
"status": "pending"
},
{
"id": "AUTH-002",
"title": "User Login",
"description": "As a registered user, I want to log in to receive a session token",
"acceptance_criteria": [
"POST /api/auth/login endpoint accepts email and password",
"Returns JWT token on successful authentication",
"Returns 401 on invalid credentials",
"Token contains user ID and expiration time",
"Unit tests pass for all cases"
],
"files": ["src/routes/auth.ts", "src/utils/jwt.ts"],
"status": "pending",
"depends_on": ["AUTH-001"]
},
{
"id": "AUTH-003",
"title": "Protected Routes Middleware",
"description": "As a developer, I want middleware to protect routes that require authentication",
"acceptance_criteria": [
"Middleware extracts JWT from Authorization header",
"Middleware verifies token validity and expiration",
"Middleware attaches user object to request",
"Returns 401 for missing or invalid tokens",
"Returns 403 for expired tokens",
"Unit tests pass for all cases"
],
"files": ["src/middleware/auth.ts"],
"status": "pending",
"depends_on": ["AUTH-002"]
},
{
"id": "AUTH-004",
"title": "Integration Tests",
"description": "Complete integration test suite for the auth system",
"acceptance_criteria": [
"Full registration-login-access flow works end-to-end",
"All edge cases are covered",
"Tests use a test database, not production",
"All tests pass"
],
"files": ["tests/integration/auth.test.ts"],
"status": "pending",
"depends_on": ["AUTH-003"]
}
],
"constraints": [
"Do not modify files outside of src/ and tests/",
"Follow existing code style and patterns",
"All new code must have TypeScript strict mode enabled",
"No console.log statements in production code"
],
"verification_command": "npm test"
}
The Execution Loop Script
This is the core RALF implementation. It iterates through the PRD stories, sends each to Claude, verifies the result, and updates the status.
#!/bin/bash
# ralf.sh -- Read-Act-Loop-Finish autonomous execution
set -euo pipefail
PRD_FILE="${1:-prd.json}"
MAX_ITERATIONS="${2:-20}"
LOG_DIR="./ralf-logs"
mkdir -p "$LOG_DIR"
ITERATION=0
echo "=== RALF Loop Starting ==="
echo "PRD: $PRD_FILE"
echo "Max iterations: $MAX_ITERATIONS"
echo ""
while [ $ITERATION -lt $MAX_ITERATIONS ]; do
ITERATION=$((ITERATION + 1))
echo "--- Iteration $ITERATION / $MAX_ITERATIONS ---"
# Find the next pending story
NEXT_STORY=$(jq -r '
.stories[]
| select(.status == "pending")
| select(
(.depends_on // [])
| all(. as $dep | $dep |
IN(input.stories[] | select(.status == "completed") | .id)
) // true
)
| .id
' "$PRD_FILE" 2>/dev/null | head -1)
# Simpler fallback: just get the first pending story
if [ -z "$NEXT_STORY" ] || [ "$NEXT_STORY" = "null" ]; then
NEXT_STORY=$(jq -r '.stories[] | select(.status == "pending") | .id' "$PRD_FILE" | head -1)
fi
# Check if all stories are done
if [ -z "$NEXT_STORY" ] || [ "$NEXT_STORY" = "null" ]; then
echo ""
echo "=== All stories completed! ==="
break
fi
# Extract story details
STORY_TITLE=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .title" "$PRD_FILE")
STORY_DESC=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .description" "$PRD_FILE")
ACCEPTANCE=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .acceptance_criteria | join(\"\n- \")" "$PRD_FILE")
CONSTRAINTS=$(jq -r '.constraints | join("\n- ")' "$PRD_FILE")
VERIFY_CMD=$(jq -r '.verification_command // "echo No verification command"' "$PRD_FILE")
echo "Working on: [$NEXT_STORY] $STORY_TITLE"
# Build the prompt for this iteration
PROMPT="You are implementing a software project defined in $PRD_FILE.
Current task: [$NEXT_STORY] $STORY_TITLE
Description: $STORY_DESC
Acceptance Criteria:
- $ACCEPTANCE
Project Constraints:
- $CONSTRAINTS
Instructions:
1. Read the PRD file and any existing code to understand the full context
2. Implement this specific story ($NEXT_STORY)
3. Write the code that satisfies ALL acceptance criteria
4. Run the verification command: $VERIFY_CMD
5. Fix any test failures
6. When all acceptance criteria are met, report SUCCESS
Do NOT implement other stories. Focus only on $NEXT_STORY."
# Execute Claude
claude -p "$PROMPT" \
--allowedTools "Read,Write,Edit,Bash(npm *),Bash(npx *),Bash(git diff *),Bash(git status *),Glob,Grep" \
--max-turns 25 \
--output-format json \
> "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json" 2>&1
EXIT_CODE=$?
# Check result
IS_ERROR=$(jq -r '.is_error // false' "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json")
RESULT=$(jq -r '.result // "No result"' "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json")
if [ "$EXIT_CODE" -eq 0 ] && [ "$IS_ERROR" = "false" ]; then
# Run verification
echo " Running verification: $VERIFY_CMD"
if eval "$VERIFY_CMD" > "$LOG_DIR/verify-${ITERATION}.log" 2>&1; then
echo " Verification PASSED"
# Update the story status in the PRD
jq "(.stories[] | select(.id == \"$NEXT_STORY\") | .status) = \"completed\"" \
"$PRD_FILE" > "${PRD_FILE}.tmp" && mv "${PRD_FILE}.tmp" "$PRD_FILE"
echo " Marked $NEXT_STORY as completed"
else
echo " Verification FAILED -- will retry next iteration"
fi
else
echo " Claude reported an error, will retry"
fi
echo ""
done
# Final status report
COMPLETED=$(jq '[.stories[] | select(.status == "completed")] | length' "$PRD_FILE")
TOTAL=$(jq '.stories | length' "$PRD_FILE")
echo "=== RALF Loop Complete ==="
echo "Completed: $COMPLETED / $TOTAL stories"
echo "Iterations used: $ITERATION / $MAX_ITERATIONS"
echo "Logs: $LOG_DIR"
max_iterations as a Safety Guard
The MAX_ITERATIONS variable prevents runaway execution. Without it, a failing story could cause the loop to retry indefinitely. Rules of thumb:
| Task Complexity | Suggested max_iterations |
|---|---|
| Simple transformations | 5-10 |
| Feature implementation | 15-25 |
| Full project build | 30-50 |
| Complex refactors | 20-40 |
Always set this value. An infinite loop with API calls will drain your budget.
How Each Iteration Gets Fresh Context
Each call to claude -p starts a fresh conversation. This is intentional -- it prevents context window degradation. Long conversations cause Claude to lose track of earlier details, producing lower-quality output. By restarting each iteration:
- Claude re-reads the PRD and sees updated statuses
- Claude examines the actual codebase (not a stale memory of it)
- Each iteration gets the full context window for its specific task
If you need continuity between iterations (rare), use --resume with the session ID from the previous run.
Verification Between Iterations
The verification step between iterations is what makes RALF reliable. After each implementation:
- Run the project's test suite
- Check that the specific acceptance criteria are met
- Only mark the story as completed if verification passes
- If verification fails, the next iteration will retry the same story
This is the difference between RALF and a naive loop that just sends prompts. RALF verifies actual outcomes, not just Claude's claim of completion.
5. Stop Hooks for Verification Gates
Stop hooks are the mechanism for preventing Claude from claiming it is done before the work actually passes quality checks. When Claude finishes responding, the Stop hook fires. If the hook returns a blocking decision, Claude continues working instead of stopping.
How Stop Hooks Work
The Stop hook fires when Claude finishes responding (but not on user interrupts). The hook receives context about the session including the last assistant message, and can:
- Exit 0: Allow Claude to stop normally
- Exit 2: Block the stop, send stderr feedback to Claude to continue working
- Return JSON with
decision: "block": Block with a reason sent to Claude
The stop_hook_active Guard
The hook input includes a stop_hook_active field that is true when Claude is already continuing because of a previous Stop hook. Always check this to prevent infinite loops:
#!/bin/bash
INPUT=$(cat)
STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active')
# Prevent infinite loop: only run the check once
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
exit 0
fi
# Your verification logic here
Hook Type: Command (Shell Script)
The simplest verification gate -- a shell script that runs tests:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".claude/hooks/verify-before-stop.sh",
"timeout": 120,
"statusMessage": "Running verification checks..."
}
]
}
]
}
}
The verification script:
#!/bin/bash
# .claude/hooks/verify-before-stop.sh
INPUT=$(cat)
STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active')
# Prevent infinite loop
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
exit 0
fi
# Run the test suite
echo "Running tests..." >&2
if ! npm test 2>&1; then
echo "Tests are failing. Fix the failing tests before stopping." >&2
exit 2 # Exit code 2 = block Claude from stopping
fi
# Run the linter
echo "Running linter..." >&2
if ! npm run lint 2>&1; then
echo "Linting errors found. Fix them before stopping." >&2
exit 2
fi
# Run type checking
echo "Running type check..." >&2
if ! npx tsc --noEmit 2>&1; then
echo "Type errors found. Fix them before stopping." >&2
exit 2
fi
# All checks passed
exit 0
Hook Type: Prompt (LLM Evaluation)
Use a prompt hook to have a fast model (Haiku by default) evaluate whether Claude should stop. This is useful for subjective quality checks:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "prompt",
"prompt": "You are evaluating whether Claude should stop working. Context: $ARGUMENTS\n\nAnalyze the conversation and determine if:\n1. All user-requested tasks are complete\n2. Any errors remain unaddressed\n3. Tests have been run and pass\n4. Code follows the project conventions described in CLAUDE.md\n\nRespond with JSON: {\"ok\": true} to allow stopping, or {\"ok\": false, \"reason\": \"your explanation\"} to continue working.",
"timeout": 30
}
]
}
]
}
}
The LLM returns {"ok": true} or {"ok": false, "reason": "..."}. If ok is false, Claude receives the reason as feedback and continues working.
Hook Type: Agent (Multi-Turn Verification)
Agent hooks are the most powerful option. They spawn a subagent that can read files, search code, and run commands to verify completion:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "agent",
"prompt": "Verify that the implementation is complete. Check:\n1. All files mentioned in the task exist\n2. Unit tests exist and pass (run: npm test)\n3. No TODO comments remain in modified files\n4. No console.log statements in production code\n\nContext: $ARGUMENTS",
"timeout": 120
}
]
}
]
}
}
The agent can use Read, Grep, Glob, and other tools to investigate the codebase. It returns the same {"ok": true/false} decision format.
Combining Multiple Verification Gates
You can chain multiple hooks. All matching hooks run in parallel:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".claude/hooks/run-tests.sh",
"timeout": 120,
"statusMessage": "Running test suite..."
},
{
"type": "command",
"command": ".claude/hooks/check-lint.sh",
"timeout": 60,
"statusMessage": "Checking code style..."
},
{
"type": "prompt",
"prompt": "Review whether all acceptance criteria from the original task are satisfied. Context: $ARGUMENTS",
"timeout": 30
}
]
}
]
}
}
If any hook blocks, Claude continues working. The stderr or reason from the blocking hook tells Claude what to fix.
6. CI/CD Integration
Claude Code integrates directly into GitHub Actions and GitLab CI/CD pipelines. This is the most common production use case for autonomous execution.
GitHub Actions Setup
Quick Setup
The fastest way to set up GitHub Actions integration:
# Inside Claude Code interactive mode
/install-github-app
This guides you through installing the Claude GitHub App and configuring secrets.
Manual Setup
- Install the Claude GitHub App: https://github.com/apps/claude
- Add
ANTHROPIC_API_KEYto your repository secrets - Create the workflow file
GitHub Actions Workflow: Respond to @claude Mentions
This is the core workflow. It triggers when someone mentions @claude in a PR or issue comment:
# .github/workflows/claude.yml
name: Claude Code
on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
jobs:
claude:
if: contains(github.event.comment.body, '@claude')
runs-on: ubuntu-latest
steps:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
Usage in PR comments:
@claude implement this feature based on the issue description
@claude fix the TypeError in the user dashboard component
@claude review this PR for security issues
GitHub Actions Workflow: Automated PR Review
Automatically review every PR when it is opened or updated:
# .github/workflows/claude-review.yml
name: Claude PR Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: "/review"
claude_args: "--max-turns 5"
GitHub Actions Workflow: Automated Issue Implementation
When an issue is labeled with claude-implement, Claude creates a PR with the implementation:
# .github/workflows/claude-implement.yml
name: Claude Auto-Implement
on:
issues:
types: [labeled]
jobs:
implement:
if: github.event.label.name == 'claude-implement'
runs-on: ubuntu-latest
steps:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: |
Read the issue description and implement the requested feature.
Create a new branch, implement the changes, and open a PR.
Follow the project's CLAUDE.md guidelines.
claude_args: |
--max-turns 25
--model claude-sonnet-4-6
--allowedTools "Read,Write,Edit,Bash,Glob,Grep"
GitHub Actions Workflow: Daily Code Quality Report
Run a scheduled code quality analysis:
# .github/workflows/claude-quality.yml
name: Daily Code Quality
on:
schedule:
- cron: "0 9 * * 1-5" # 9 AM weekdays
jobs:
quality-report:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: |
Analyze the codebase for quality issues:
1. Find dead code and unused exports
2. Identify overly complex functions (cyclomatic complexity)
3. Check for missing error handling
4. Look for potential performance issues
5. Summarize findings as a GitHub issue
claude_args: "--max-turns 10 --model sonnet"
GitHub Actions: Configuration Reference
The claude-code-action@v1 accepts these parameters:
| Parameter | Description | Required |
|---|---|---|
anthropic_api_key |
Claude API key | Yes (unless Bedrock/Vertex) |
prompt |
Instructions for Claude | No |
claude_args |
CLI arguments passed to Claude | No |
github_token |
GitHub token for API access | No |
trigger_phrase |
Custom trigger phrase (default: @claude) |
No |
use_bedrock |
Use AWS Bedrock instead of Claude API | No |
use_vertex |
Use Google Vertex AI instead of Claude API | No |
Pass CLI arguments via claude_args:
claude_args: "--max-turns 5 --model claude-sonnet-4-6 --allowedTools 'Read,Edit,Bash'"
GitLab CI/CD Setup
GitLab CI/CD integration works similarly but uses .gitlab-ci.yml instead:
# .gitlab-ci.yml
stages:
- ai
claude:
stage: ai
image: node:24-alpine3.21
rules:
- if: '$CI_PIPELINE_SOURCE == "web"'
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
variables:
GIT_STRATEGY: fetch
before_script:
- apk update
- apk add --no-cache git curl bash
- curl -fsSL https://claude.ai/install.sh | bash
script:
- >
claude
-p "${AI_FLOW_INPUT:-'Review this MR and suggest improvements'}"
--permission-mode acceptEdits
--allowedTools "Bash Read Edit Write"
--debug
GitLab CI/CD: AWS Bedrock Integration
claude-bedrock:
stage: ai
image: node:24-alpine3.21
rules:
- if: '$CI_PIPELINE_SOURCE == "web"'
before_script:
- apk add --no-cache bash curl jq git python3 py3-pip
- pip install --no-cache-dir awscli
- curl -fsSL https://claude.ai/install.sh | bash
- export AWS_WEB_IDENTITY_TOKEN_FILE="${CI_JOB_JWT_FILE:-/tmp/oidc_token}"
- if [ -n "${CI_JOB_JWT_V2}" ]; then printf "%s" "$CI_JOB_JWT_V2" > "$AWS_WEB_IDENTITY_TOKEN_FILE"; fi
- >
aws sts assume-role-with-web-identity
--role-arn "$AWS_ROLE_TO_ASSUME"
--role-session-name "gitlab-claude-$(date +%s)"
--web-identity-token "file://$AWS_WEB_IDENTITY_TOKEN_FILE"
--duration-seconds 3600 > /tmp/aws_creds.json
- export AWS_ACCESS_KEY_ID="$(jq -r .Credentials.AccessKeyId /tmp/aws_creds.json)"
- export AWS_SECRET_ACCESS_KEY="$(jq -r .Credentials.SecretAccessKey /tmp/aws_creds.json)"
- export AWS_SESSION_TOKEN="$(jq -r .Credentials.SessionToken /tmp/aws_creds.json)"
script:
- >
claude
-p "${AI_FLOW_INPUT:-'Implement the requested changes and open an MR'}"
--permission-mode acceptEdits
--allowedTools "Bash Read Edit Write"
--debug
variables:
AWS_REGION: "us-west-2"
7. Log Analysis and Monitoring
One of the most practical autonomous uses of Claude is piping live logs through it for analysis.
Basic Log Piping
# Pipe last 100 lines for analysis
tail -100 /var/log/app.log | claude -p "Identify errors and anomalies in these logs"
# Live log monitoring
tail -f /var/log/app.log | claude -p "Watch for errors and report them as they appear"
Anomaly Detection Script
#!/bin/bash
# log-monitor.sh -- Monitor logs and alert on anomalies
set -euo pipefail
LOG_FILE="${1:-/var/log/app.log}"
CHECK_INTERVAL="${2:-300}" # seconds between checks
ALERT_FILE="./alerts.log"
echo "Monitoring: $LOG_FILE (checking every ${CHECK_INTERVAL}s)"
while true; do
# Get new log entries since last check
NEW_LINES=$(tail -200 "$LOG_FILE")
if [ -n "$NEW_LINES" ]; then
ANALYSIS=$(echo "$NEW_LINES" | claude -p \
"Analyze these log entries for anomalies. Look for: \
1. Error patterns (stack traces, HTTP 5xx, timeout errors) \
2. Performance degradation (slow queries, high latency) \
3. Security concerns (auth failures, unusual access patterns) \
4. Resource issues (memory warnings, disk space, connection pool) \
\
Output format: \
- SEVERITY: CRITICAL/WARNING/INFO \
- CATEGORY: error/performance/security/resource \
- SUMMARY: one-line description \
- DETAILS: relevant log entries \
\
If no anomalies found, output: NO_ANOMALIES" \
--output-format json \
--max-turns 2 2>/dev/null | jq -r '.result // "ERROR"')
if [ "$ANALYSIS" != "NO_ANOMALIES" ] && [ "$ANALYSIS" != "ERROR" ]; then
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$TIMESTAMP] ALERT:" >> "$ALERT_FILE"
echo "$ANALYSIS" >> "$ALERT_FILE"
echo "---" >> "$ALERT_FILE"
# Optionally send notification
# curl -X POST "$SLACK_WEBHOOK" -d "{\"text\": \"Log Alert: $ANALYSIS\"}"
fi
fi
sleep "$CHECK_INTERVAL"
done
Error Classification Pipeline
#!/bin/bash
# classify-errors.sh -- Classify errors from the last 24 hours
set -euo pipefail
# Extract errors from the last 24 hours
ERRORS=$(journalctl --since "24 hours ago" --priority=err --no-pager 2>/dev/null || \
grep -i "error\|exception\|fatal" /var/log/app.log | tail -500)
if [ -z "$ERRORS" ]; then
echo "No errors found in the last 24 hours."
exit 0
fi
echo "$ERRORS" | claude -p \
"Classify these errors into categories and provide a summary report. \
For each category: \
1. Name the category \
2. Count occurrences \
3. Identify the root cause if possible \
4. Suggest a fix \
5. Rate severity (critical/high/medium/low) \
\
Sort by severity (critical first)." \
--output-format json \
--max-turns 2 | jq -r '.result'
Deployment Log Watcher
#!/bin/bash
# watch-deploy.sh -- Monitor deployment and alert on issues
set -euo pipefail
DEPLOY_LOG="${1:-/var/log/deploy.log}"
echo "Watching deployment log: $DEPLOY_LOG"
tail -f "$DEPLOY_LOG" | while IFS= read -r line; do
# Check for error patterns
if echo "$line" | grep -qi "error\|fail\|crash\|fatal\|panic"; then
# Send the error context to Claude for analysis
CONTEXT=$(tail -20 "$DEPLOY_LOG")
ANALYSIS=$(echo "$CONTEXT" | claude -p \
"A deployment error occurred. Analyze the context and provide: \
1. What went wrong \
2. Is this a blocking error or recoverable? \
3. Suggested immediate action" \
--max-turns 2 \
--output-format json 2>/dev/null | jq -r '.result // "Analysis failed"')
echo ""
echo "=== DEPLOYMENT ALERT ==="
echo "Trigger: $line"
echo "Analysis: $ANALYSIS"
echo "========================"
fi
done
8. Safety and Sandboxing
Autonomous execution requires strong safety boundaries. Claude Code provides multiple layers of protection.
Permission Modes
Claude Code has five permission modes, each offering a different level of autonomy:
| Mode | Description | Use Case |
|---|---|---|
default |
Prompts for each tool use | Interactive development |
acceptEdits |
Auto-accepts file edits, prompts for bash | Semi-autonomous |
plan |
Read-only, no modifications | Analysis and planning |
dontAsk |
Auto-denies unless pre-approved | Constrained automation |
bypassPermissions |
Skips all prompts | Fully autonomous (containers only) |
Set the mode via CLI:
# Accept edits automatically
claude -p "Refactor this module" --permission-mode acceptEdits
# Only allow pre-approved tools
claude -p "Run analysis" --permission-mode dontAsk --allowedTools "Read,Grep,Glob"
# Bypass all permissions (containers only!)
claude -p "Build the project" --permission-mode bypassPermissions
--dangerously-skip-permissions
This flag bypasses ALL permission checks. It is the same as --permission-mode bypassPermissions:
claude --dangerously-skip-permissions -p "Implement the feature"
When to use it:
- Inside Docker containers with no network access
- Inside VMs that will be destroyed after use
- In CI/CD runners that are ephemeral
- Never on your local machine with access to your files and network
When NOT to use it:
- On your development machine
- In any environment with internet access
- In any environment with access to sensitive files
- In any environment that persists after the run
--allowedTools for Precise Scoping
Instead of bypassing all permissions, scope exactly which tools Claude can use:
# Read-only analysis
claude -p "Analyze this codebase" \
--allowedTools "Read,Grep,Glob"
# Edit with specific bash commands only
claude -p "Fix the tests" \
--allowedTools "Read,Edit,Bash(npm test *),Bash(npx jest *)"
# Full git workflow but no arbitrary bash
claude -p "Create a commit" \
--allowedTools "Read,Edit,Write,Bash(git *)"
Sandboxing with /sandbox
Claude Code's native sandboxing provides OS-level filesystem and network isolation:
# Inside Claude Code interactive mode
/sandbox
This opens a menu where you choose:
- Auto-allow mode: Sandboxed commands run automatically; non-sandboxable commands use the normal permission flow
- Regular permissions mode: All commands go through permission flow but are sandboxed
How Sandboxing Works
Filesystem isolation:
- Write access restricted to the current working directory and subdirectories
- Read access to the broader filesystem (with deny rules respected)
- Cannot modify files outside the working directory
Network isolation:
- Only approved domains can be accessed
- New domain requests trigger permission prompts
- All child processes inherit the same restrictions
OS-level enforcement:
- macOS: Uses Seatbelt framework
- Linux: Uses bubblewrap
- WSL2: Uses bubblewrap
Docker Container Isolation
The safest pattern for fully autonomous execution is running Claude inside a Docker container:
# Dockerfile.claude-worker
FROM node:24-slim
# Install Claude Code
RUN npm install -g @anthropic-ai/claude-code
# Create workspace
WORKDIR /workspace
# Copy project files
COPY . .
# Install project dependencies
RUN npm install
# Run Claude with full permissions (safe inside container)
CMD ["claude", "-p", "--permission-mode", "bypassPermissions", \
"--max-turns", "30", \
"Implement the features defined in prd.json"]
The "Container Without Internet" Pattern
This is the gold standard for safe autonomous execution:
#!/bin/bash
# safe-autonomous.sh -- Run Claude in a network-isolated container
set -euo pipefail
PROJECT_DIR="$(pwd)"
TASK="${1:-Implement the features in prd.json}"
docker run --rm \
--network none \
-v "${PROJECT_DIR}:/workspace" \
-e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
-w /workspace \
node:24-slim \
bash -c "
npm install -g @anthropic-ai/claude-code && \
claude -p '${TASK}' \
--permission-mode bypassPermissions \
--max-turns 30 \
--output-format json
"
Wait -- --network none blocks API calls too. You need a more nuanced approach:
#!/bin/bash
# safe-autonomous-v2.sh -- Container with API-only network access
set -euo pipefail
# Create a Docker network that only allows Anthropic API access
docker network create --driver bridge claude-restricted 2>/dev/null || true
docker run --rm \
--network claude-restricted \
-v "$(pwd):/workspace" \
-e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
-w /workspace \
node:24-slim \
bash -c "
npm install -g @anthropic-ai/claude-code && \
claude -p 'Implement the features defined in prd.json' \
--permission-mode bypassPermissions \
--max-turns 30 \
--output-format json
"
For true network restriction with API access, use iptables rules or a proxy that only allows traffic to api.anthropic.com.
Network Restrictions via Sandbox Settings
Configure allowed domains in your settings file:
{
"sandbox": {
"network": {
"allowedDomains": [
"api.anthropic.com",
"registry.npmjs.org",
"github.com"
]
}
}
}
File System Restrictions
Use permission deny rules to protect sensitive areas:
{
"permissions": {
"deny": [
"Read(~/.ssh/**)",
"Read(~/.aws/**)",
"Read(//.env)",
"Edit(~/.bashrc)",
"Edit(~/.zshrc)",
"Bash(rm -rf *)",
"Bash(curl *)",
"Bash(wget *)"
]
}
}
Sandbox Configuration Reference
{
"sandbox": {
"mode": "auto-allow",
"network": {
"httpProxyPort": 8080,
"socksProxyPort": 8081,
"allowedDomains": ["api.anthropic.com"]
},
"excludedCommands": ["docker", "watchman"],
"allowUnsandboxedCommands": false,
"allowUnixSockets": false
}
}
Setting allowUnsandboxedCommands to false disables the escape hatch entirely -- all commands must run sandboxed or be in excludedCommands.
9. RALF vs GSD -- When to Use Which
RALF and GSD are two different patterns for autonomous Claude execution. Understanding when to use each is critical.
RALF: The Pure Executor
RALF works best when you have already defined exactly what needs to be done.
Characteristics:
- Input is a structured PRD with user stories and acceptance criteria
- Claude implements one story per iteration
- Each iteration is independently verifiable
- No planning phase -- Claude executes the plan you already wrote
- Works with
--max-turnsper iteration for tight control
Best for:
- Well-defined features with clear acceptance criteria
- Batch implementations following a known pattern
- Tasks where you have already made all the design decisions
- Repeatable automation (run the same PRD on different projects)
GSD: The Planner-Executor
GSD (Get Stuff Done) is a pattern where Claude first plans the work, then executes it. It handles ambiguity better than RALF because it includes a scoping phase.
Characteristics:
- Input is a high-level goal or problem description
- Claude first creates a plan (in Plan Mode or a separate planning phase)
- Then Claude executes the plan step by step
- Includes self-correction and plan adjustment
- Better for tasks where the path to completion is unclear
Best for:
- Larger projects where you have not defined all the stories
- Tasks requiring research before implementation
- Ambiguous requirements that need scoping
- One-off projects where writing a full PRD is overkill
Decision Matrix
| Factor | Use RALF | Use GSD |
|---|---|---|
| Requirements clarity | Well-defined stories | Vague or high-level |
| Task scope | Small to medium | Medium to large |
| Repeatability | High (same PRD, different projects) | Low (one-off) |
| Design decisions | Already made | Need Claude to make them |
| Verification | Clear acceptance criteria | Subjective quality |
| Control | Maximum (per-iteration limits) | Moderate (plan-level) |
| Cost predictability | High (bounded iterations) | Lower (planning adds cost) |
Can You Combine Them?
Yes. A common pattern is GSD for planning, RALF for execution:
#!/bin/bash
# Phase 1: GSD -- Claude creates the PRD
claude -p "Analyze this codebase and create a prd.json file for adding \
user authentication. Include user stories with acceptance criteria. \
Follow the format in prd-template.json." \
--allowedTools "Read,Write,Grep,Glob" \
--max-turns 15
# Phase 2: RALF -- Execute the PRD
./ralf.sh prd.json 20
This gives you the best of both worlds: Claude's planning ability for scoping, and RALF's structured execution for implementation.
10. Advanced Patterns
Multi-Stage Pipelines
Chain multiple Claude invocations in a sequence where each stage feeds the next:
#!/bin/bash
# multi-stage-pipeline.sh
set -euo pipefail
echo "=== Stage 1: Research ==="
claude -p "Analyze the codebase and identify all API endpoints. \
Write a report to ./pipeline/api-inventory.md" \
--allowedTools "Read,Write,Grep,Glob" \
--max-turns 10
echo "=== Stage 2: Plan ==="
claude -p "Read ./pipeline/api-inventory.md. Design a comprehensive \
test plan for all endpoints. Write the plan to ./pipeline/test-plan.md" \
--allowedTools "Read,Write,Grep,Glob" \
--max-turns 10
echo "=== Stage 3: Implement ==="
claude -p "Read ./pipeline/test-plan.md. Implement all the tests \
described in the plan. Place tests in src/__tests__/api/" \
--allowedTools "Read,Write,Edit,Bash(npm test *),Glob,Grep" \
--max-turns 25
echo "=== Stage 4: Verify ==="
claude -p "Run the full test suite (npm test). If any tests fail, \
fix them. Report final results." \
--allowedTools "Read,Edit,Bash(npm test *),Bash(npx jest *),Glob,Grep" \
--max-turns 15
echo "=== Stage 5: Report ==="
REPORT=$(claude -p "Read the test results and the code changes made. \
Generate a summary report of what was tested and the results." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json | jq -r '.result')
echo "$REPORT" > ./pipeline/final-report.md
echo "Pipeline complete. Report: ./pipeline/final-report.md"
Watchdog Scripts
A watchdog monitors Claude's execution and restarts on failure:
#!/bin/bash
# watchdog.sh -- Restart Claude on failure
set -euo pipefail
TASK="${1:-Implement the features in prd.json}"
MAX_RETRIES=3
RETRY_DELAY=30
for attempt in $(seq 1 $MAX_RETRIES); do
echo "Attempt $attempt / $MAX_RETRIES"
claude -p "$TASK" \
--allowedTools "Read,Write,Edit,Bash(npm *),Glob,Grep" \
--max-turns 20 \
--output-format json \
> "./watchdog-attempt-${attempt}.json" 2>&1
EXIT_CODE=$?
IS_ERROR=$(jq -r '.is_error // false' "./watchdog-attempt-${attempt}.json" 2>/dev/null || echo "true")
if [ "$EXIT_CODE" -eq 0 ] && [ "$IS_ERROR" = "false" ]; then
echo "Success on attempt $attempt"
exit 0
fi
echo "Attempt $attempt failed (exit: $EXIT_CODE, error: $IS_ERROR)"
if [ $attempt -lt $MAX_RETRIES ]; then
echo "Retrying in ${RETRY_DELAY}s..."
sleep $RETRY_DELAY
fi
done
echo "All $MAX_RETRIES attempts failed"
exit 1
Result Aggregation from Parallel Agents
#!/bin/bash
# parallel-review.sh -- Run multiple reviews in parallel, aggregate results
set -euo pipefail
RESULTS_DIR="./review-results"
mkdir -p "$RESULTS_DIR"
# Launch parallel reviews
claude -p "Review this codebase for security vulnerabilities. \
Focus on auth, input validation, and data exposure." \
--allowedTools "Read,Grep,Glob" \
--max-turns 10 \
--output-format json \
> "$RESULTS_DIR/security.json" &
PID_SECURITY=$!
claude -p "Review this codebase for performance issues. \
Focus on N+1 queries, memory leaks, and bundle size." \
--allowedTools "Read,Grep,Glob" \
--max-turns 10 \
--output-format json \
> "$RESULTS_DIR/performance.json" &
PID_PERF=$!
claude -p "Review this codebase for code quality issues. \
Focus on complexity, duplication, and naming." \
--allowedTools "Read,Grep,Glob" \
--max-turns 10 \
--output-format json \
> "$RESULTS_DIR/quality.json" &
PID_QUALITY=$!
# Wait for all to complete
wait $PID_SECURITY $PID_PERF $PID_QUALITY
# Aggregate results
SECURITY=$(jq -r '.result' "$RESULTS_DIR/security.json")
PERFORMANCE=$(jq -r '.result' "$RESULTS_DIR/performance.json")
QUALITY=$(jq -r '.result' "$RESULTS_DIR/quality.json")
# Feed aggregated results to a synthesizer
echo "Security Review:
$SECURITY
Performance Review:
$PERFORMANCE
Code Quality Review:
$QUALITY" | claude -p "Synthesize these three code reviews into a single \
prioritized report. Group findings by severity (Critical, High, Medium, Low). \
Deduplicate any overlapping findings." \
--max-turns 3 \
--output-format json | jq -r '.result' > "./review-results/final-report.md"
echo "Combined report: ./review-results/final-report.md"
Scheduled Claude Runs with Cron
# Add to crontab with: crontab -e
# Daily code quality check at 6 AM
0 6 * * * cd /path/to/project && /usr/local/bin/claude -p "Run a code quality analysis and write results to ./reports/quality-$(date +\%Y\%m\%d).md" --allowedTools "Read,Write,Grep,Glob" --max-turns 10 >> /var/log/claude-cron.log 2>&1
# Weekly dependency audit on Mondays at 8 AM
0 8 * * 1 cd /path/to/project && /usr/local/bin/claude -p "Audit dependencies for security vulnerabilities and outdated packages. Write report to ./reports/deps-$(date +\%Y\%m\%d).md" --allowedTools "Read,Write,Bash(npm audit *),Bash(npx *),Grep,Glob" --max-turns 10 >> /var/log/claude-cron.log 2>&1
Using Agent Teams in CI/CD
For complex CI/CD tasks, you can use agent teams where multiple Claude instances collaborate:
#!/bin/bash
# ci-agent-team.sh -- Multiple Claude instances working on a PR
set -euo pipefail
PR_DIFF=$(gh pr diff "$1")
# Security review agent
echo "$PR_DIFF" | claude -p "You are a security reviewer. Analyze this PR diff for vulnerabilities." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json > /tmp/security-review.json &
# Performance review agent
echo "$PR_DIFF" | claude -p "You are a performance reviewer. Analyze this PR diff for performance issues." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json > /tmp/perf-review.json &
# Test coverage agent
echo "$PR_DIFF" | claude -p "You are a test coverage analyst. Check if new code has adequate tests." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json > /tmp/test-review.json &
wait
# Aggregate and post comment
COMBINED=$(cat /tmp/security-review.json /tmp/perf-review.json /tmp/test-review.json | \
jq -s '[.[].result] | join("\n---\n")')
gh pr comment "$1" --body "## Automated Review\n\n${COMBINED}"
11. The Claude Agent SDK (Programmatic Access)
The Agent SDK lets you use Claude Code as a library in your TypeScript or Python applications. It provides the same tools, agent loop, and context management as the CLI, but with programmatic control.
Installation
# TypeScript
npm install @anthropic-ai/claude-agent-sdk
# Python
pip install claude-agent-sdk
TypeScript: Basic Usage
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find and fix the bug in auth.py",
options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
if ("result" in message) {
console.log(message.result);
}
}
Python: Basic Usage
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
async def main():
async for message in query(
prompt="Find and fix the bug in auth.py",
options=ClaudeAgentOptions(allowed_tools=["Read", "Edit", "Bash"]),
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())
Spawning Sessions and Resuming
import { query } from "@anthropic-ai/claude-agent-sdk";
let sessionId: string | undefined;
// First query: capture the session ID
for await (const message of query({
prompt: "Read the authentication module",
options: { allowedTools: ["Read", "Glob"] }
})) {
if (message.type === "system" && message.subtype === "init") {
sessionId = message.session_id;
}
}
// Resume with full context from the first query
for await (const message of query({
prompt: "Now find all places that call it",
options: { resume: sessionId }
})) {
if ("result" in message) console.log(message.result);
}
Custom Subagents via SDK
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Use the code-reviewer agent to review this codebase",
options: {
allowedTools: ["Read", "Glob", "Grep", "Task"],
agents: {
"code-reviewer": {
description: "Expert code reviewer for quality and security reviews.",
prompt: "Analyze code quality and suggest improvements.",
tools: ["Read", "Glob", "Grep"]
}
}
}
})) {
if ("result" in message) console.log(message.result);
}
SDK with Hooks
import asyncio
from datetime import datetime
from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher
async def log_file_change(input_data, tool_use_id, context):
file_path = input_data.get("tool_input", {}).get("file_path", "unknown")
with open("./audit.log", "a") as f:
f.write(f"{datetime.now()}: modified {file_path}\n")
return {}
async def main():
async for message in query(
prompt="Refactor utils.py to improve readability",
options=ClaudeAgentOptions(
permission_mode="acceptEdits",
hooks={
"PostToolUse": [
HookMatcher(matcher="Edit|Write", hooks=[log_file_change])
]
},
),
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())
SDK with MCP Servers
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Open example.com and describe what you see",
options: {
mcpServers: {
playwright: { command: "npx", args: ["@playwright/mcp@latest"] }
}
}
})) {
if ("result" in message) console.log(message.result);
}
When to Use the SDK vs CLI
| Use Case | Best Choice |
|---|---|
| Shell scripts and CI/CD | CLI (claude -p) |
| Custom applications | SDK |
| One-off automation | CLI |
| Production services | SDK |
| Prototyping pipelines | CLI |
| Building agents that spawn agents | SDK |
12. Exercises
Exercise 1: Basic Headless Pipeline
Write a bash script that:
- Takes a directory path as an argument
- Uses
claude -pwith--output-format jsonto analyze all.tsfiles for unused imports - Collects results into a single JSON report file
- Prints a summary of total files scanned and issues found
Stretch goal: Use --json-schema to enforce a structured output format.
Exercise 2: Batch File Transformer
Write a batch processing script that:
- Finds all JavaScript files in a project
- Processes each file with Claude to convert
require()statements toimportsyntax - Runs in parallel (4 at a time) using
xargs -P - Handles errors gracefully (logs failures, continues processing)
- Generates a summary report
Stretch goal: Add a --dry-run flag that uses --permission-mode plan to preview changes without modifying files.
Exercise 3: Stop Hook Verification Gate
Create a Stop hook configuration that:
- Runs the project test suite
- Checks that no
TODOcomments remain in modified files - Verifies that all new functions have JSDoc comments
- Uses the
stop_hook_activeguard to prevent infinite loops
Test it by starting Claude with a task that intentionally leaves TODOs, and verify that the hook catches them.
Exercise 4: RALF Loop Implementation
- Create a
prd.jsonwith 3 user stories for a simple feature (e.g., a REST API for a todo list) - Implement the RALF loop script from Section 4
- Run it and observe how it progresses through stories
- Add a verification step that checks test results between iterations
- Observe what happens when a story fails verification
Exercise 5: GitHub Actions PR Review
Create a GitHub Actions workflow that:
- Triggers on PR open and synchronize events
- Uses
claude-code-action@v1to review the PR - Posts a review comment with security, performance, and quality findings
- Limits to 5 max turns to control cost
- Uses a custom system prompt for your project's specific review criteria
Exercise 6: Multi-Stage Pipeline
Build a 4-stage pipeline:
- Audit: Scan for security vulnerabilities in dependencies
- Analyze: Identify code quality issues
- Fix: Automatically fix the safe-to-fix issues
- Report: Generate a comprehensive report of changes made
Each stage should pass context to the next via files in a ./pipeline/ directory. The final report should include cost information from the JSON output of each stage.
Exercise 7: Log Monitor
Create a log monitoring script that:
- Tails a log file (create a fake one for testing)
- Every 60 seconds, sends new entries to Claude for analysis
- Classifies entries as normal, warning, or critical
- Writes alerts to a separate file
- Includes a mechanism to prevent duplicate alerts for the same issue
13. Pro Tips from Boris Cherny
Boris Cherny, an engineer at Anthropic who works on Claude Code, has shared several practices for autonomous execution:
Use --permission-mode dontAsk in Sandboxes
The dontAsk mode auto-denies any tool that is not explicitly pre-approved. This is safer than bypassPermissions because you define exactly what is allowed:
claude -p "Implement the feature" \
--permission-mode dontAsk \
--allowedTools "Read,Write,Edit,Bash(npm test *),Bash(npx jest *),Grep,Glob"
If Claude tries to use a tool not in --allowedTools, it is automatically denied without prompting. This prevents unexpected behavior while still allowing Claude to work autonomously with the tools it needs.
Background Verification Agents
Run a separate Claude instance that periodically checks the work of the primary instance:
# Main worker
claude -p "Implement the auth system" \
--allowedTools "Read,Write,Edit,Bash(npm *),Grep,Glob" \
--max-turns 30 &
WORKER_PID=$!
# Background verifier (runs every 2 minutes)
while kill -0 $WORKER_PID 2>/dev/null; do
sleep 120
claude -p "Check the current state of the codebase. \
Run npm test and npm run lint. \
Report any issues found." \
--allowedTools "Read,Bash(npm test *),Bash(npm run lint *),Grep,Glob" \
--max-turns 5 \
--output-format json | jq -r '.result' >> ./verification-log.txt
done
wait $WORKER_PID
echo "Worker finished. Verification log: ./verification-log.txt"
"Give Claude a Way to Verify Its Work"
The single most important practice for autonomous execution: always give Claude the tools and commands to verify what it has done. If Claude can run tests, it will run tests. If it cannot, it will guess whether the code works.
# Bad: Claude cannot verify its work
claude -p "Add authentication" --allowedTools "Read,Write,Edit"
# Good: Claude can run tests to verify
claude -p "Add authentication" \
--allowedTools "Read,Write,Edit,Bash(npm test *),Bash(npx tsc --noEmit)"
Chrome Extension for Browser Testing
For projects with browser-based UIs, give Claude access to a browser for visual verification:
claude --chrome -p "Implement the login page and verify it renders correctly"
The --chrome flag enables browser automation, allowing Claude to visually verify UI changes.
14. Anti-Patterns
Anti-Pattern 1: No max_iterations / No --max-turns
Problem: Running a RALF loop or headless command without any iteration limit.
Consequence: If Claude gets stuck on a failing test or an impossible task, it will loop indefinitely, burning through your API budget.
Fix: Always set --max-turns in headless mode and MAX_ITERATIONS in RALF loops.
# Bad
claude -p "Fix all the bugs"
# Good
claude -p "Fix all the bugs" --max-turns 15
Anti-Pattern 2: --dangerously-skip-permissions Outside Containers
Problem: Using --dangerously-skip-permissions on your local machine.
Consequence: Claude has unrestricted access to your entire filesystem and network. A prompt injection attack or a simple misunderstanding could delete files, exfiltrate data, or modify system configuration.
Fix: Only use --dangerously-skip-permissions inside ephemeral containers or VMs. On your local machine, use --allowedTools to scope permissions precisely.
Anti-Pattern 3: Not Verifying Between Iterations
Problem: Running a RALF loop that marks stories as complete based on Claude's claim, without running actual tests.
Consequence: Broken code accumulates. By the time you discover the issues, 10 stories have been "completed" with cascading failures.
Fix: Always run the verification command between iterations and only mark stories complete when verification passes.
Anti-Pattern 4: Overly Broad --allowedTools
Problem: Using --allowedTools "Bash" which allows any bash command.
Consequence: Claude can run rm -rf, curl to external servers, or modify system files. No guardrails.
Fix: Scope bash permissions with prefix matching:
# Bad
--allowedTools "Bash"
# Good
--allowedTools "Bash(npm test *),Bash(npm run *),Bash(git diff *),Read,Edit"
Anti-Pattern 5: No Budget Limit
Problem: Running autonomous pipelines without cost controls.
Consequence: An overnight pipeline could consume hundreds of dollars if it enters a retry loop.
Fix: Use --max-budget-usd to cap spending:
claude -p "Run the analysis" --max-budget-usd 5.00 --max-turns 20
Anti-Pattern 6: Ignoring Exit Codes
Problem: Not checking the exit code from claude -p in scripts.
Consequence: A failed Claude run is treated as a success. Downstream steps execute on broken state.
Fix: Always check exit codes:
if ! claude -p "Run tests" --max-turns 10; then
echo "Claude execution failed"
exit 1
fi
Anti-Pattern 7: No Logging
Problem: Running autonomous pipelines without saving Claude's output.
Consequence: When something goes wrong, you have no way to diagnose what happened.
Fix: Always redirect output to log files:
claude -p "Implement feature" \
--output-format json \
--verbose \
> "./logs/run-$(date +%Y%m%d-%H%M%S).json" 2>&1
Anti-Pattern 8: Single Massive Prompt
Problem: Putting an entire project specification into a single prompt and hoping Claude executes it all in one go.
Consequence: Claude loses track of requirements as the context window fills with code it has written. Quality degrades sharply.
Fix: Break the work into stages (multi-stage pipeline) or stories (RALF loop). Each invocation gets a focused task with fresh context.
15. Official Documentation Links
Core References
- CLI Reference -- All CLI flags, options, and usage
- Headless Mode / Agent SDK CLI -- Programmatic usage via
-p - Sandboxing -- Filesystem and network isolation
- Permissions -- Permission modes and rule syntax
- Hooks Reference -- Hook events, matchers, exit codes
- Hooks Guide -- Practical hook examples
CI/CD Integration
- GitHub Actions -- GitHub Actions setup and workflows
- GitLab CI/CD -- GitLab pipeline integration
- Claude Code Action Repository -- The official GitHub Action
Agent SDK
- Agent SDK Overview -- SDK capabilities and examples
- TypeScript SDK -- TypeScript API reference
- Python SDK -- Python API reference
- Streaming Output -- Real-time streaming
- Structured Outputs -- JSON Schema validation
Related Guides
- Settings Reference -- All configuration options
- Sub-Agents -- Creating specialized agents
- Common Workflows -- Workflow patterns
- Best Practices -- Planning, prompting, verification
- Memory (CLAUDE.md) -- Project context configuration
Sandbox Runtime (Open Source)
The sandbox runtime is available as an open source npm package:
npx @anthropic-ai/sandbox-runtime <command-to-sandbox>
Source: github.com/anthropic-experimental/sandbox-runtime
Community Resources
- How Boris Uses Claude Code -- Boris Cherny's workflow
- Agent SDK Demo Repository -- Example agents
Last Updated: 2026-02-20 Compiled from official Anthropic documentation, Claude Agent SDK docs, and community best practices