The capstone level. Set it up, walk away, come back to finished work. This guide covers headless mode, batch processing, autonomous executio
The capstone level. Set it up, walk away, come back to finished work. This guide covers headless mode, batch processing, autonomous execution loops, CI/CD integration, safety sandboxing, and the Agent SDK for building production-grade pipelines with Claude Code.
Everything up to this point has been interactive. You type, Claude responds. You approve, Claude acts. Level 7 removes you from the loop.
Autonomous pipelines are workflows where Claude Code executes without human intervention from start to finish. You define the task, the boundaries, the verification criteria, and the safety constraints. Then you walk away. When you come back, the work is done -- or it stopped safely at a verification gate that needs your attention.
| Level | Description | Human Involvement |
|---|---|---|
| Interactive | You type every prompt | Every turn |
| Semi-autonomous | You approve each tool use | Frequent |
| Permission-scoped | You pre-approve certain tools | Occasional |
| Fully autonomous | Claude runs start to finish | None until completion |
Autonomous pipelines are built from these primitives:
-p flag) -- Run Claude without a terminal UI--allowedTools, --permission-mode) -- Define what Claude can do--output-format json) -- Parse results programmatically--max-turns) -- Prevent runaway execution/sandbox, containers) -- Enforce filesystem and network boundariesHeadless mode is the foundation of every autonomous pipeline. The -p (or --print) flag tells Claude Code to accept a prompt, execute it, print the result, and exit -- no interactive UI, no permission dialogs blocking execution.
# Basic headless execution
claude -p "What files are in this project?"
# With tool permissions pre-approved
claude -p "Run the test suite and report failures" \
--allowedTools "Bash(npm test *),Read"
# With structured output
claude -p "Summarize this project" --output-format json
The -p flag changes Claude Code from an interactive REPL to a one-shot command-line tool. It processes the prompt, uses whatever tools are needed (subject to permissions), and writes the result to stdout.
There are three ways to feed input to headless Claude:
claude -p "Explain the authentication flow in this codebase"
# Pipe file contents
cat src/auth.py | claude -p "Review this code for security issues"
# Pipe command output
git diff HEAD~5 | claude -p "Summarize these changes"
# Pipe log output
tail -100 /var/log/app.log | claude -p "Identify any errors or anomalies"
# Using system prompt from a file
claude -p "Review the code" --system-prompt-file ./prompts/security-review.txt
# Appending instructions from a file while keeping defaults
claude -p "Review this PR" --append-system-prompt-file ./prompts/style-rules.txt
Claude Code supports three output formats in headless mode. Each serves a different use case.
Plain text, suitable for human reading or simple piping.
claude -p "What does the main function do?"
# Output: The main function initializes the application...
Structured JSON with metadata. The result text is in the result field.
claude -p "Summarize this project" --output-format json
Output structure:
{
"type": "result",
"subtype": "success",
"cost_usd": 0.003,
"is_error": false,
"duration_ms": 4521,
"duration_api_ms": 3200,
"num_turns": 2,
"result": "This project is a REST API built with Express.js...",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"total_cost_usd": 0.003
}
Extract the result with jq:
claude -p "Summarize this project" --output-format json | jq -r '.result'
Get validated structured output conforming to a specific schema:
claude -p "Extract the main function names from auth.py" \
--output-format json \
--json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"string"}}},"required":["functions"]}'
The structured data appears in the structured_output field:
claude -p "Extract function names" \
--output-format json \
--json-schema '...' \
| jq '.structured_output'
Newline-delimited JSON for real-time streaming. Each line is a separate event.
claude -p "Explain recursion" \
--output-format stream-json \
--verbose \
--include-partial-messages
Filter for just the streaming text:
claude -p "Write a poem" \
--output-format stream-json \
--verbose \
--include-partial-messages | \
jq -rj 'select(.type == "stream_event" and .event.delta.type? == "text_delta") | .event.delta.text'
Claude Code returns meaningful exit codes in headless mode:
| Exit Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (general failure) |
| 2 | Max turns reached (when using --max-turns) |
Use exit codes in scripts for conditional logic:
claude -p "Run tests and fix failures" \
--allowedTools "Bash,Read,Edit" \
--max-turns 10
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "All tasks completed successfully"
elif [ $EXIT_CODE -eq 2 ]; then
echo "Hit max turns limit -- may need more iterations"
else
echo "Error occurred"
fi
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY |
API key for authentication |
CLAUDE_CODE_USE_BEDROCK |
Set to 1 to use AWS Bedrock |
CLAUDE_CODE_USE_VERTEX |
Set to 1 to use Google Vertex AI |
These flags are the most relevant for headless/autonomous operation:
| Flag | Description |
|---|---|
-p, --print |
Run in headless mode (required) |
--output-format |
text, json, or stream-json |
--json-schema |
JSON Schema for structured output |
--input-format |
Input format: text or stream-json |
--include-partial-messages |
Include streaming events (requires stream-json) |
--allowedTools |
Tools that execute without permission prompts |
--disallowedTools |
Tools removed from the model entirely |
--tools |
Restrict which tools are available ("Bash,Edit,Read") |
--permission-mode |
default, acceptEdits, plan, dontAsk, bypassPermissions |
--dangerously-skip-permissions |
Skip ALL permission prompts |
--max-turns |
Limit agentic turns (exits with code 2 when reached) |
--max-budget-usd |
Maximum dollar spend before stopping |
--model |
Model selection: sonnet, opus, or full model ID |
--fallback-model |
Fallback when primary model is overloaded |
--system-prompt |
Replace entire system prompt |
--system-prompt-file |
Replace system prompt from file |
--append-system-prompt |
Append to default system prompt |
--append-system-prompt-file |
Append from file to system prompt |
--continue, -c |
Continue most recent conversation |
--resume, -r |
Resume specific session by ID |
--no-session-persistence |
Do not save session to disk |
--verbose |
Show full turn-by-turn output |
--debug |
Enable debug logging |
--mcp-config |
Load MCP servers from JSON config |
# First run
claude -p "Review this codebase for performance issues" --output-format json > first_pass.json
# Extract session ID
SESSION_ID=$(jq -r '.session_id' first_pass.json)
# Continue the same conversation
claude -p "Now focus on the database queries" --resume "$SESSION_ID"
# Or just continue the most recent conversation
claude -p "Generate a summary of all issues found" --continue
Batch processing is where headless mode pays off. Instead of running Claude once, you run it across hundreds of files, each invocation independent and parallelizable.
# Process each file in a loop
for file in src/**/*.ts; do
claude -p "Add JSDoc comments to all exported functions in this file" \
--allowedTools "Read,Edit" \
< "$file"
done
# Process 4 files at a time in parallel
find src -name "*.ts" | xargs -P 4 -I {} \
claude -p "Add JSDoc comments to all exported functions in {}" \
--allowedTools "Read,Edit"
# Process with GNU parallel (better job control)
find src -name "*.ts" | parallel -j 4 \
claude -p "Add JSDoc comments to all exported functions in {}" \
--allowedTools "Read,Edit"
The --allowedTools flag is critical for batch scripts. It defines exactly which tools Claude can use without prompting. This follows the permission rule syntax:
# Read-only analysis (safest)
claude -p "Review this code" --allowedTools "Read,Grep,Glob"
# Read and edit (for transformations)
claude -p "Add types" --allowedTools "Read,Edit"
# With specific bash commands
claude -p "Run tests" --allowedTools "Bash(npm test *),Read"
# Wildcard bash with prefix matching
claude -p "Git operations" --allowedTools "Bash(git diff *),Bash(git log *),Bash(git status *)"
The space before * matters. Bash(git diff *) matches git diff HEAD but not git diff-index. Without the space, Bash(git diff*) would match both.
#!/bin/bash
set -euo pipefail
RESULTS_DIR="./batch-results"
mkdir -p "$RESULTS_DIR"
FAILED=0
SUCCEEDED=0
TOTAL=0
for file in src/components/*.tsx; do
TOTAL=$((TOTAL + 1))
BASENAME=$(basename "$file" .tsx)
echo "Processing: $file"
if claude -p "Add comprehensive prop type definitions to this React component. \
Read the file at $file, add TypeScript interfaces for all props, \
and ensure all props are properly typed." \
--allowedTools "Read,Edit" \
--max-turns 5 \
--output-format json \
> "$RESULTS_DIR/${BASENAME}.json" 2>&1; then
SUCCEEDED=$((SUCCEEDED + 1))
echo " OK: $file"
else
FAILED=$((FAILED + 1))
echo " FAIL: $file (exit code: $?)"
fi
done
echo ""
echo "=== Batch Results ==="
echo "Total: $TOTAL"
echo "Succeeded: $SUCCEEDED"
echo "Failed: $FAILED"
#!/bin/bash
# Collect results from multiple Claude runs into a single report
REPORT_FILE="./batch-report.md"
echo "# Batch Processing Report" > "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "Generated: $(date)" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
for result_file in ./batch-results/*.json; do
FILENAME=$(basename "$result_file" .json)
RESULT=$(jq -r '.result // "No result"' "$result_file")
COST=$(jq -r '.cost_usd // "unknown"' "$result_file")
IS_ERROR=$(jq -r '.is_error // false' "$result_file")
echo "## $FILENAME" >> "$REPORT_FILE"
echo "- Cost: \$${COST}" >> "$REPORT_FILE"
echo "- Error: ${IS_ERROR}" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "$RESULT" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "---" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
done
echo "Report written to $REPORT_FILE"
#!/bin/bash
# migrate-to-hooks.sh
# Migrate React class components to functional components with hooks
set -euo pipefail
MIGRATION_PROMPT="Read this file. If it contains a React class component, \
convert it to a functional component using hooks. Preserve all functionality: \
- Convert state to useState \
- Convert lifecycle methods to useEffect \
- Convert class methods to regular functions or useCallback \
- Preserve all props and their types \
- Keep all existing tests passing \
If the file is already a functional component, make no changes."
LOG_FILE="./migration-log.txt"
echo "Migration started: $(date)" > "$LOG_FILE"
find src/components -name "*.tsx" -o -name "*.jsx" | while read -r file; do
echo "Migrating: $file"
claude -p "$MIGRATION_PROMPT" \
--allowedTools "Read,Edit" \
--append-system-prompt "You are migrating the file at: $file" \
--max-turns 8 \
--output-format json \
2>&1 | tee -a "$LOG_FILE" | jq -r '.result // "ERROR"' || true
echo "---" >> "$LOG_FILE"
done
echo "Migration completed: $(date)" >> "$LOG_FILE"
#!/bin/bash
# add-license-headers.sh
# Add license headers to all source files missing them
set -euo pipefail
LICENSE_HEADER="Copyright (c) 2026 Acme Corp. All rights reserved.
Licensed under the MIT License."
find src -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" | while read -r file; do
# Check if file already has a license header
if head -5 "$file" | grep -q "Copyright"; then
echo "SKIP (already has header): $file"
continue
fi
echo "Adding header to: $file"
claude -p "Add this license header as a comment block at the very top of \
the file at $file (before any imports). Use the appropriate comment \
syntax for the file type. The license text is: ${LICENSE_HEADER}" \
--allowedTools "Read,Edit" \
--max-turns 3 \
--output-format json | jq -r '.is_error' || true
done
#!/bin/bash
# generate-tests.sh
# Generate unit tests for every module that lacks them
set -euo pipefail
RESULTS_DIR="./test-generation-results"
mkdir -p "$RESULTS_DIR"
TEST_PROMPT="Read the source file and generate comprehensive unit tests for it. \
Requirements: \
- Use the existing test framework (Jest/Vitest) found in the project \
- Cover all exported functions and classes \
- Include edge cases and error conditions \
- Follow existing test patterns in the project \
- Place the test file next to the source file as __tests__/<filename>.test.ts \
- Run the tests to verify they pass"
find src -name "*.ts" ! -name "*.test.ts" ! -name "*.spec.ts" ! -path "*__tests__*" | while read -r file; do
BASENAME=$(basename "$file" .ts)
TEST_FILE="$(dirname "$file")/__tests__/${BASENAME}.test.ts"
# Skip if test already exists
if [ -f "$TEST_FILE" ]; then
echo "SKIP (test exists): $file"
continue
fi
echo "Generating tests for: $file"
claude -p "$TEST_PROMPT for the file at $file" \
--allowedTools "Read,Write,Edit,Bash(npx jest *),Bash(npm test *),Glob,Grep" \
--max-turns 15 \
--output-format json \
> "$RESULTS_DIR/${BASENAME}.json" 2>&1
IS_ERROR=$(jq -r '.is_error' "$RESULTS_DIR/${BASENAME}.json")
if [ "$IS_ERROR" = "true" ]; then
echo " FAIL: $file"
else
echo " OK: $file"
fi
done
#!/bin/bash
# review-prs.sh
# Review all open PRs in a repository
set -euo pipefail
REVIEW_PROMPT="You are a senior code reviewer. Review the PR diff provided via stdin. \
Focus on: \
1. Security vulnerabilities (SQL injection, XSS, auth bypass) \
2. Performance issues (N+1 queries, memory leaks, unnecessary re-renders) \
3. Code quality (naming, complexity, duplication) \
4. Test coverage (are new features tested?) \
5. Breaking changes (API contract, database schema) \
Output a structured review with severity levels: CRITICAL, WARNING, INFO."
# Get list of open PRs
PR_NUMBERS=$(gh pr list --state open --json number --jq '.[].number')
for pr in $PR_NUMBERS; do
echo "Reviewing PR #${pr}..."
gh pr diff "$pr" | claude -p "$REVIEW_PROMPT" \
--append-system-prompt "You are reviewing PR #${pr}." \
--output-format json \
--max-turns 3 \
> "./reviews/pr-${pr}-review.json" 2>&1
RESULT=$(jq -r '.result' "./reviews/pr-${pr}-review.json")
echo "PR #${pr} Review:"
echo "$RESULT"
echo "---"
done
RALF stands for Read-Act-Loop-Finish. It is a pattern for autonomous, multi-iteration execution where Claude reads a specification, works through it story by story, verifies after each step, and loops until all work is done or a safety limit is hit.
RALF is not a built-in Claude Code feature. It is a scripting pattern that wraps Claude Code's headless mode in a loop with:
prd.json)max_iterations is reachedThe key insight is that each iteration starts a fresh Claude context (or continues a session), preventing the context window from degrading over long-running tasks.
The PRD (Product Requirements Document) file defines what Claude should build. It uses a structured format with user stories and acceptance criteria so progress can be verified programmatically.
{
"project": "User Authentication System",
"description": "Implement a complete authentication system with login, signup, and session management",
"tech_stack": {
"language": "TypeScript",
"framework": "Express.js",
"database": "PostgreSQL with Prisma ORM",
"testing": "Jest"
},
"stories": [
{
"id": "AUTH-001",
"title": "User Registration",
"description": "As a new user, I want to create an account so I can access the application",
"acceptance_criteria": [
"POST /api/auth/register endpoint accepts email and password",
"Password is hashed with bcrypt before storage",
"Email uniqueness is enforced at the database level",
"Returns 201 with user object (without password) on success",
"Returns 409 if email already exists",
"Input validation rejects invalid email formats",
"Unit tests pass for all success and error cases"
],
"files": ["src/routes/auth.ts", "src/models/user.ts", "src/middleware/validation.ts"],
"status": "pending"
},
{
"id": "AUTH-002",
"title": "User Login",
"description": "As a registered user, I want to log in to receive a session token",
"acceptance_criteria": [
"POST /api/auth/login endpoint accepts email and password",
"Returns JWT token on successful authentication",
"Returns 401 on invalid credentials",
"Token contains user ID and expiration time",
"Unit tests pass for all cases"
],
"files": ["src/routes/auth.ts", "src/utils/jwt.ts"],
"status": "pending",
"depends_on": ["AUTH-001"]
},
{
"id": "AUTH-003",
"title": "Protected Routes Middleware",
"description": "As a developer, I want middleware to protect routes that require authentication",
"acceptance_criteria": [
"Middleware extracts JWT from Authorization header",
"Middleware verifies token validity and expiration",
"Middleware attaches user object to request",
"Returns 401 for missing or invalid tokens",
"Returns 403 for expired tokens",
"Unit tests pass for all cases"
],
"files": ["src/middleware/auth.ts"],
"status": "pending",
"depends_on": ["AUTH-002"]
},
{
"id": "AUTH-004",
"title": "Integration Tests",
"description": "Complete integration test suite for the auth system",
"acceptance_criteria": [
"Full registration-login-access flow works end-to-end",
"All edge cases are covered",
"Tests use a test database, not production",
"All tests pass"
],
"files": ["tests/integration/auth.test.ts"],
"status": "pending",
"depends_on": ["AUTH-003"]
}
],
"constraints": [
"Do not modify files outside of src/ and tests/",
"Follow existing code style and patterns",
"All new code must have TypeScript strict mode enabled",
"No console.log statements in production code"
],
"verification_command": "npm test"
}
This is the core RALF implementation. It iterates through the PRD stories, sends each to Claude, verifies the result, and updates the status.
#!/bin/bash
# ralf.sh -- Read-Act-Loop-Finish autonomous execution
set -euo pipefail
PRD_FILE="${1:-prd.json}"
MAX_ITERATIONS="${2:-20}"
LOG_DIR="./ralf-logs"
mkdir -p "$LOG_DIR"
ITERATION=0
echo "=== RALF Loop Starting ==="
echo "PRD: $PRD_FILE"
echo "Max iterations: $MAX_ITERATIONS"
echo ""
while [ $ITERATION -lt $MAX_ITERATIONS ]; do
ITERATION=$((ITERATION + 1))
echo "--- Iteration $ITERATION / $MAX_ITERATIONS ---"
# Find the next pending story
NEXT_STORY=$(jq -r '
.stories[]
| select(.status == "pending")
| select(
(.depends_on // [])
| all(. as $dep | $dep |
IN(input.stories[] | select(.status == "completed") | .id)
) // true
)
| .id
' "$PRD_FILE" 2>/dev/null | head -1)
# Simpler fallback: just get the first pending story
if [ -z "$NEXT_STORY" ] || [ "$NEXT_STORY" = "null" ]; then
NEXT_STORY=$(jq -r '.stories[] | select(.status == "pending") | .id' "$PRD_FILE" | head -1)
fi
# Check if all stories are done
if [ -z "$NEXT_STORY" ] || [ "$NEXT_STORY" = "null" ]; then
echo ""
echo "=== All stories completed! ==="
break
fi
# Extract story details
STORY_TITLE=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .title" "$PRD_FILE")
STORY_DESC=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .description" "$PRD_FILE")
ACCEPTANCE=$(jq -r ".stories[] | select(.id == \"$NEXT_STORY\") | .acceptance_criteria | join(\"\n- \")" "$PRD_FILE")
CONSTRAINTS=$(jq -r '.constraints | join("\n- ")' "$PRD_FILE")
VERIFY_CMD=$(jq -r '.verification_command // "echo No verification command"' "$PRD_FILE")
echo "Working on: [$NEXT_STORY] $STORY_TITLE"
# Build the prompt for this iteration
PROMPT="You are implementing a software project defined in $PRD_FILE.
Current task: [$NEXT_STORY] $STORY_TITLE
Description: $STORY_DESC
Acceptance Criteria:
- $ACCEPTANCE
Project Constraints:
- $CONSTRAINTS
Instructions:
1. Read the PRD file and any existing code to understand the full context
2. Implement this specific story ($NEXT_STORY)
3. Write the code that satisfies ALL acceptance criteria
4. Run the verification command: $VERIFY_CMD
5. Fix any test failures
6. When all acceptance criteria are met, report SUCCESS
Do NOT implement other stories. Focus only on $NEXT_STORY."
# Execute Claude
claude -p "$PROMPT" \
--allowedTools "Read,Write,Edit,Bash(npm *),Bash(npx *),Bash(git diff *),Bash(git status *),Glob,Grep" \
--max-turns 25 \
--output-format json \
> "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json" 2>&1
EXIT_CODE=$?
# Check result
IS_ERROR=$(jq -r '.is_error // false' "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json")
RESULT=$(jq -r '.result // "No result"' "$LOG_DIR/iteration-${ITERATION}-${NEXT_STORY}.json")
if [ "$EXIT_CODE" -eq 0 ] && [ "$IS_ERROR" = "false" ]; then
# Run verification
echo " Running verification: $VERIFY_CMD"
if eval "$VERIFY_CMD" > "$LOG_DIR/verify-${ITERATION}.log" 2>&1; then
echo " Verification PASSED"
# Update the story status in the PRD
jq "(.stories[] | select(.id == \"$NEXT_STORY\") | .status) = \"completed\"" \
"$PRD_FILE" > "${PRD_FILE}.tmp" && mv "${PRD_FILE}.tmp" "$PRD_FILE"
echo " Marked $NEXT_STORY as completed"
else
echo " Verification FAILED -- will retry next iteration"
fi
else
echo " Claude reported an error, will retry"
fi
echo ""
done
# Final status report
COMPLETED=$(jq '[.stories[] | select(.status == "completed")] | length' "$PRD_FILE")
TOTAL=$(jq '.stories | length' "$PRD_FILE")
echo "=== RALF Loop Complete ==="
echo "Completed: $COMPLETED / $TOTAL stories"
echo "Iterations used: $ITERATION / $MAX_ITERATIONS"
echo "Logs: $LOG_DIR"
The MAX_ITERATIONS variable prevents runaway execution. Without it, a failing story could cause the loop to retry indefinitely. Rules of thumb:
| Task Complexity | Suggested max_iterations |
|---|---|
| Simple transformations | 5-10 |
| Feature implementation | 15-25 |
| Full project build | 30-50 |
| Complex refactors | 20-40 |
Always set this value. An infinite loop with API calls will drain your budget.
Each call to claude -p starts a fresh conversation. This is intentional -- it prevents context window degradation. Long conversations cause Claude to lose track of earlier details, producing lower-quality output. By restarting each iteration:
If you need continuity between iterations (rare), use --resume with the session ID from the previous run.
The verification step between iterations is what makes RALF reliable. After each implementation:
This is the difference between RALF and a naive loop that just sends prompts. RALF verifies actual outcomes, not just Claude's claim of completion.
Stop hooks are the mechanism for preventing Claude from claiming it is done before the work actually passes quality checks. When Claude finishes responding, the Stop hook fires. If the hook returns a blocking decision, Claude continues working instead of stopping.
The Stop hook fires when Claude finishes responding (but not on user interrupts). The hook receives context about the session including the last assistant message, and can:
decision: "block": Block with a reason sent to ClaudeThe hook input includes a stop_hook_active field that is true when Claude is already continuing because of a previous Stop hook. Always check this to prevent infinite loops:
#!/bin/bash
INPUT=$(cat)
STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active')
# Prevent infinite loop: only run the check once
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
exit 0
fi
# Your verification logic here
The simplest verification gate -- a shell script that runs tests:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".claude/hooks/verify-before-stop.sh",
"timeout": 120,
"statusMessage": "Running verification checks..."
}
]
}
]
}
}
The verification script:
#!/bin/bash
# .claude/hooks/verify-before-stop.sh
INPUT=$(cat)
STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active')
# Prevent infinite loop
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
exit 0
fi
# Run the test suite
echo "Running tests..." >&2
if ! npm test 2>&1; then
echo "Tests are failing. Fix the failing tests before stopping." >&2
exit 2 # Exit code 2 = block Claude from stopping
fi
# Run the linter
echo "Running linter..." >&2
if ! npm run lint 2>&1; then
echo "Linting errors found. Fix them before stopping." >&2
exit 2
fi
# Run type checking
echo "Running type check..." >&2
if ! npx tsc --noEmit 2>&1; then
echo "Type errors found. Fix them before stopping." >&2
exit 2
fi
# All checks passed
exit 0
Use a prompt hook to have a fast model (Haiku by default) evaluate whether Claude should stop. This is useful for subjective quality checks:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "prompt",
"prompt": "You are evaluating whether Claude should stop working. Context: $ARGUMENTS\n\nAnalyze the conversation and determine if:\n1. All user-requested tasks are complete\n2. Any errors remain unaddressed\n3. Tests have been run and pass\n4. Code follows the project conventions described in CLAUDE.md\n\nRespond with JSON: {\"ok\": true} to allow stopping, or {\"ok\": false, \"reason\": \"your explanation\"} to continue working.",
"timeout": 30
}
]
}
]
}
}
The LLM returns {"ok": true} or {"ok": false, "reason": "..."}. If ok is false, Claude receives the reason as feedback and continues working.
Agent hooks are the most powerful option. They spawn a subagent that can read files, search code, and run commands to verify completion:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "agent",
"prompt": "Verify that the implementation is complete. Check:\n1. All files mentioned in the task exist\n2. Unit tests exist and pass (run: npm test)\n3. No TODO comments remain in modified files\n4. No console.log statements in production code\n\nContext: $ARGUMENTS",
"timeout": 120
}
]
}
]
}
}
The agent can use Read, Grep, Glob, and other tools to investigate the codebase. It returns the same {"ok": true/false} decision format.
You can chain multiple hooks. All matching hooks run in parallel:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".claude/hooks/run-tests.sh",
"timeout": 120,
"statusMessage": "Running test suite..."
},
{
"type": "command",
"command": ".claude/hooks/check-lint.sh",
"timeout": 60,
"statusMessage": "Checking code style..."
},
{
"type": "prompt",
"prompt": "Review whether all acceptance criteria from the original task are satisfied. Context: $ARGUMENTS",
"timeout": 30
}
]
}
]
}
}
If any hook blocks, Claude continues working. The stderr or reason from the blocking hook tells Claude what to fix.
Claude Code integrates directly into GitHub Actions and GitLab CI/CD pipelines. This is the most common production use case for autonomous execution.
The fastest way to set up GitHub Actions integration:
# Inside Claude Code interactive mode
/install-github-app
This guides you through installing the Claude GitHub App and configuring secrets.
ANTHROPIC_API_KEY to your repository secretsThis is the core workflow. It triggers when someone mentions @claude in a PR or issue comment:
# .github/workflows/claude.yml
name: Claude Code
on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
jobs:
claude:
if: contains(github.event.comment.body, '@claude')
runs-on: ubuntu-latest
steps:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
Usage in PR comments:
@claude implement this feature based on the issue description
@claude fix the TypeError in the user dashboard component
@claude review this PR for security issues
Automatically review every PR when it is opened or updated:
# .github/workflows/claude-review.yml
name: Claude PR Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: "/review"
claude_args: "--max-turns 5"
When an issue is labeled with claude-implement, Claude creates a PR with the implementation:
# .github/workflows/claude-implement.yml
name: Claude Auto-Implement
on:
issues:
types: [labeled]
jobs:
implement:
if: github.event.label.name == 'claude-implement'
runs-on: ubuntu-latest
steps:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: |
Read the issue description and implement the requested feature.
Create a new branch, implement the changes, and open a PR.
Follow the project's CLAUDE.md guidelines.
claude_args: |
--max-turns 25
--model claude-sonnet-4-6
--allowedTools "Read,Write,Edit,Bash,Glob,Grep"
Run a scheduled code quality analysis:
# .github/workflows/claude-quality.yml
name: Daily Code Quality
on:
schedule:
- cron: "0 9 * * 1-5" # 9 AM weekdays
jobs:
quality-report:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: |
Analyze the codebase for quality issues:
1. Find dead code and unused exports
2. Identify overly complex functions (cyclomatic complexity)
3. Check for missing error handling
4. Look for potential performance issues
5. Summarize findings as a GitHub issue
claude_args: "--max-turns 10 --model sonnet"
The claude-code-action@v1 accepts these parameters:
| Parameter | Description | Required |
|---|---|---|
anthropic_api_key |
Claude API key | Yes (unless Bedrock/Vertex) |
prompt |
Instructions for Claude | No |
claude_args |
CLI arguments passed to Claude | No |
github_token |
GitHub token for API access | No |
trigger_phrase |
Custom trigger phrase (default: @claude) |
No |
use_bedrock |
Use AWS Bedrock instead of Claude API | No |
use_vertex |
Use Google Vertex AI instead of Claude API | No |
Pass CLI arguments via claude_args:
claude_args: "--max-turns 5 --model claude-sonnet-4-6 --allowedTools 'Read,Edit,Bash'"
GitLab CI/CD integration works similarly but uses .gitlab-ci.yml instead:
# .gitlab-ci.yml
stages:
- ai
claude:
stage: ai
image: node:24-alpine3.21
rules:
- if: '$CI_PIPELINE_SOURCE == "web"'
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
variables:
GIT_STRATEGY: fetch
before_script:
- apk update
- apk add --no-cache git curl bash
- curl -fsSL https://claude.ai/install.sh | bash
script:
- >
claude
-p "${AI_FLOW_INPUT:-'Review this MR and suggest improvements'}"
--permission-mode acceptEdits
--allowedTools "Bash Read Edit Write"
--debug
claude-bedrock:
stage: ai
image: node:24-alpine3.21
rules:
- if: '$CI_PIPELINE_SOURCE == "web"'
before_script:
- apk add --no-cache bash curl jq git python3 py3-pip
- pip install --no-cache-dir awscli
- curl -fsSL https://claude.ai/install.sh | bash
- export AWS_WEB_IDENTITY_TOKEN_FILE="${CI_JOB_JWT_FILE:-/tmp/oidc_token}"
- if [ -n "${CI_JOB_JWT_V2}" ]; then printf "%s" "$CI_JOB_JWT_V2" > "$AWS_WEB_IDENTITY_TOKEN_FILE"; fi
- >
aws sts assume-role-with-web-identity
--role-arn "$AWS_ROLE_TO_ASSUME"
--role-session-name "gitlab-claude-$(date +%s)"
--web-identity-token "file://$AWS_WEB_IDENTITY_TOKEN_FILE"
--duration-seconds 3600 > /tmp/aws_creds.json
- export AWS_ACCESS_KEY_ID="$(jq -r .Credentials.AccessKeyId /tmp/aws_creds.json)"
- export AWS_SECRET_ACCESS_KEY="$(jq -r .Credentials.SecretAccessKey /tmp/aws_creds.json)"
- export AWS_SESSION_TOKEN="$(jq -r .Credentials.SessionToken /tmp/aws_creds.json)"
script:
- >
claude
-p "${AI_FLOW_INPUT:-'Implement the requested changes and open an MR'}"
--permission-mode acceptEdits
--allowedTools "Bash Read Edit Write"
--debug
variables:
AWS_REGION: "us-west-2"
One of the most practical autonomous uses of Claude is piping live logs through it for analysis.
# Pipe last 100 lines for analysis
tail -100 /var/log/app.log | claude -p "Identify errors and anomalies in these logs"
# Live log monitoring
tail -f /var/log/app.log | claude -p "Watch for errors and report them as they appear"
#!/bin/bash
# log-monitor.sh -- Monitor logs and alert on anomalies
set -euo pipefail
LOG_FILE="${1:-/var/log/app.log}"
CHECK_INTERVAL="${2:-300}" # seconds between checks
ALERT_FILE="./alerts.log"
echo "Monitoring: $LOG_FILE (checking every ${CHECK_INTERVAL}s)"
while true; do
# Get new log entries since last check
NEW_LINES=$(tail -200 "$LOG_FILE")
if [ -n "$NEW_LINES" ]; then
ANALYSIS=$(echo "$NEW_LINES" | claude -p \
"Analyze these log entries for anomalies. Look for: \
1. Error patterns (stack traces, HTTP 5xx, timeout errors) \
2. Performance degradation (slow queries, high latency) \
3. Security concerns (auth failures, unusual access patterns) \
4. Resource issues (memory warnings, disk space, connection pool) \
\
Output format: \
- SEVERITY: CRITICAL/WARNING/INFO \
- CATEGORY: error/performance/security/resource \
- SUMMARY: one-line description \
- DETAILS: relevant log entries \
\
If no anomalies found, output: NO_ANOMALIES" \
--output-format json \
--max-turns 2 2>/dev/null | jq -r '.result // "ERROR"')
if [ "$ANALYSIS" != "NO_ANOMALIES" ] && [ "$ANALYSIS" != "ERROR" ]; then
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$TIMESTAMP] ALERT:" >> "$ALERT_FILE"
echo "$ANALYSIS" >> "$ALERT_FILE"
echo "---" >> "$ALERT_FILE"
# Optionally send notification
# curl -X POST "$SLACK_WEBHOOK" -d "{\"text\": \"Log Alert: $ANALYSIS\"}"
fi
fi
sleep "$CHECK_INTERVAL"
done
#!/bin/bash
# classify-errors.sh -- Classify errors from the last 24 hours
set -euo pipefail
# Extract errors from the last 24 hours
ERRORS=$(journalctl --since "24 hours ago" --priority=err --no-pager 2>/dev/null || \
grep -i "error\|exception\|fatal" /var/log/app.log | tail -500)
if [ -z "$ERRORS" ]; then
echo "No errors found in the last 24 hours."
exit 0
fi
echo "$ERRORS" | claude -p \
"Classify these errors into categories and provide a summary report. \
For each category: \
1. Name the category \
2. Count occurrences \
3. Identify the root cause if possible \
4. Suggest a fix \
5. Rate severity (critical/high/medium/low) \
\
Sort by severity (critical first)." \
--output-format json \
--max-turns 2 | jq -r '.result'
#!/bin/bash
# watch-deploy.sh -- Monitor deployment and alert on issues
set -euo pipefail
DEPLOY_LOG="${1:-/var/log/deploy.log}"
echo "Watching deployment log: $DEPLOY_LOG"
tail -f "$DEPLOY_LOG" | while IFS= read -r line; do
# Check for error patterns
if echo "$line" | grep -qi "error\|fail\|crash\|fatal\|panic"; then
# Send the error context to Claude for analysis
CONTEXT=$(tail -20 "$DEPLOY_LOG")
ANALYSIS=$(echo "$CONTEXT" | claude -p \
"A deployment error occurred. Analyze the context and provide: \
1. What went wrong \
2. Is this a blocking error or recoverable? \
3. Suggested immediate action" \
--max-turns 2 \
--output-format json 2>/dev/null | jq -r '.result // "Analysis failed"')
echo ""
echo "=== DEPLOYMENT ALERT ==="
echo "Trigger: $line"
echo "Analysis: $ANALYSIS"
echo "========================"
fi
done
Autonomous execution requires strong safety boundaries. Claude Code provides multiple layers of protection.
Claude Code has five permission modes, each offering a different level of autonomy:
| Mode | Description | Use Case |
|---|---|---|
default |
Prompts for each tool use | Interactive development |
acceptEdits |
Auto-accepts file edits, prompts for bash | Semi-autonomous |
plan |
Read-only, no modifications | Analysis and planning |
dontAsk |
Auto-denies unless pre-approved | Constrained automation |
bypassPermissions |
Skips all prompts | Fully autonomous (containers only) |
Set the mode via CLI:
# Accept edits automatically
claude -p "Refactor this module" --permission-mode acceptEdits
# Only allow pre-approved tools
claude -p "Run analysis" --permission-mode dontAsk --allowedTools "Read,Grep,Glob"
# Bypass all permissions (containers only!)
claude -p "Build the project" --permission-mode bypassPermissions
This flag bypasses ALL permission checks. It is the same as --permission-mode bypassPermissions:
claude --dangerously-skip-permissions -p "Implement the feature"
When to use it:
When NOT to use it:
Instead of bypassing all permissions, scope exactly which tools Claude can use:
# Read-only analysis
claude -p "Analyze this codebase" \
--allowedTools "Read,Grep,Glob"
# Edit with specific bash commands only
claude -p "Fix the tests" \
--allowedTools "Read,Edit,Bash(npm test *),Bash(npx jest *)"
# Full git workflow but no arbitrary bash
claude -p "Create a commit" \
--allowedTools "Read,Edit,Write,Bash(git *)"
Claude Code's native sandboxing provides OS-level filesystem and network isolation:
# Inside Claude Code interactive mode
/sandbox
This opens a menu where you choose:
Filesystem isolation:
Network isolation:
OS-level enforcement:
The safest pattern for fully autonomous execution is running Claude inside a Docker container:
# Dockerfile.claude-worker
FROM node:24-slim
# Install Claude Code
RUN npm install -g @anthropic-ai/claude-code
# Create workspace
WORKDIR /workspace
# Copy project files
COPY . .
# Install project dependencies
RUN npm install
# Run Claude with full permissions (safe inside container)
CMD ["claude", "-p", "--permission-mode", "bypassPermissions", \
"--max-turns", "30", \
"Implement the features defined in prd.json"]
This is the gold standard for safe autonomous execution:
#!/bin/bash
# safe-autonomous.sh -- Run Claude in a network-isolated container
set -euo pipefail
PROJECT_DIR="$(pwd)"
TASK="${1:-Implement the features in prd.json}"
docker run --rm \
--network none \
-v "${PROJECT_DIR}:/workspace" \
-e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
-w /workspace \
node:24-slim \
bash -c "
npm install -g @anthropic-ai/claude-code && \
claude -p '${TASK}' \
--permission-mode bypassPermissions \
--max-turns 30 \
--output-format json
"
Wait -- --network none blocks API calls too. You need a more nuanced approach:
#!/bin/bash
# safe-autonomous-v2.sh -- Container with API-only network access
set -euo pipefail
# Create a Docker network that only allows Anthropic API access
docker network create --driver bridge claude-restricted 2>/dev/null || true
docker run --rm \
--network claude-restricted \
-v "$(pwd):/workspace" \
-e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
-w /workspace \
node:24-slim \
bash -c "
npm install -g @anthropic-ai/claude-code && \
claude -p 'Implement the features defined in prd.json' \
--permission-mode bypassPermissions \
--max-turns 30 \
--output-format json
"
For true network restriction with API access, use iptables rules or a proxy that only allows traffic to api.anthropic.com.
Configure allowed domains in your settings file:
{
"sandbox": {
"network": {
"allowedDomains": [
"api.anthropic.com",
"registry.npmjs.org",
"github.com"
]
}
}
}
Use permission deny rules to protect sensitive areas:
{
"permissions": {
"deny": [
"Read(~/.ssh/**)",
"Read(~/.aws/**)",
"Read(//.env)",
"Edit(~/.bashrc)",
"Edit(~/.zshrc)",
"Bash(rm -rf *)",
"Bash(curl *)",
"Bash(wget *)"
]
}
}
{
"sandbox": {
"mode": "auto-allow",
"network": {
"httpProxyPort": 8080,
"socksProxyPort": 8081,
"allowedDomains": ["api.anthropic.com"]
},
"excludedCommands": ["docker", "watchman"],
"allowUnsandboxedCommands": false,
"allowUnixSockets": false
}
}
Setting allowUnsandboxedCommands to false disables the escape hatch entirely -- all commands must run sandboxed or be in excludedCommands.
RALF and GSD are two different patterns for autonomous Claude execution. Understanding when to use each is critical.
RALF works best when you have already defined exactly what needs to be done.
Characteristics:
--max-turns per iteration for tight controlBest for:
GSD (Get Stuff Done) is a pattern where Claude first plans the work, then executes it. It handles ambiguity better than RALF because it includes a scoping phase.
Characteristics:
Best for:
| Factor | Use RALF | Use GSD |
|---|---|---|
| Requirements clarity | Well-defined stories | Vague or high-level |
| Task scope | Small to medium | Medium to large |
| Repeatability | High (same PRD, different projects) | Low (one-off) |
| Design decisions | Already made | Need Claude to make them |
| Verification | Clear acceptance criteria | Subjective quality |
| Control | Maximum (per-iteration limits) | Moderate (plan-level) |
| Cost predictability | High (bounded iterations) | Lower (planning adds cost) |
Yes. A common pattern is GSD for planning, RALF for execution:
#!/bin/bash
# Phase 1: GSD -- Claude creates the PRD
claude -p "Analyze this codebase and create a prd.json file for adding \
user authentication. Include user stories with acceptance criteria. \
Follow the format in prd-template.json." \
--allowedTools "Read,Write,Grep,Glob" \
--max-turns 15
# Phase 2: RALF -- Execute the PRD
./ralf.sh prd.json 20
This gives you the best of both worlds: Claude's planning ability for scoping, and RALF's structured execution for implementation.
Chain multiple Claude invocations in a sequence where each stage feeds the next:
#!/bin/bash
# multi-stage-pipeline.sh
set -euo pipefail
echo "=== Stage 1: Research ==="
claude -p "Analyze the codebase and identify all API endpoints. \
Write a report to ./pipeline/api-inventory.md" \
--allowedTools "Read,Write,Grep,Glob" \
--max-turns 10
echo "=== Stage 2: Plan ==="
claude -p "Read ./pipeline/api-inventory.md. Design a comprehensive \
test plan for all endpoints. Write the plan to ./pipeline/test-plan.md" \
--allowedTools "Read,Write,Grep,Glob" \
--max-turns 10
echo "=== Stage 3: Implement ==="
claude -p "Read ./pipeline/test-plan.md. Implement all the tests \
described in the plan. Place tests in src/__tests__/api/" \
--allowedTools "Read,Write,Edit,Bash(npm test *),Glob,Grep" \
--max-turns 25
echo "=== Stage 4: Verify ==="
claude -p "Run the full test suite (npm test). If any tests fail, \
fix them. Report final results." \
--allowedTools "Read,Edit,Bash(npm test *),Bash(npx jest *),Glob,Grep" \
--max-turns 15
echo "=== Stage 5: Report ==="
REPORT=$(claude -p "Read the test results and the code changes made. \
Generate a summary report of what was tested and the results." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json | jq -r '.result')
echo "$REPORT" > ./pipeline/final-report.md
echo "Pipeline complete. Report: ./pipeline/final-report.md"
A watchdog monitors Claude's execution and restarts on failure:
#!/bin/bash
# watchdog.sh -- Restart Claude on failure
set -euo pipefail
TASK="${1:-Implement the features in prd.json}"
MAX_RETRIES=3
RETRY_DELAY=30
for attempt in $(seq 1 $MAX_RETRIES); do
echo "Attempt $attempt / $MAX_RETRIES"
claude -p "$TASK" \
--allowedTools "Read,Write,Edit,Bash(npm *),Glob,Grep" \
--max-turns 20 \
--output-format json \
> "./watchdog-attempt-${attempt}.json" 2>&1
EXIT_CODE=$?
IS_ERROR=$(jq -r '.is_error // false' "./watchdog-attempt-${attempt}.json" 2>/dev/null || echo "true")
if [ "$EXIT_CODE" -eq 0 ] && [ "$IS_ERROR" = "false" ]; then
echo "Success on attempt $attempt"
exit 0
fi
echo "Attempt $attempt failed (exit: $EXIT_CODE, error: $IS_ERROR)"
if [ $attempt -lt $MAX_RETRIES ]; then
echo "Retrying in ${RETRY_DELAY}s..."
sleep $RETRY_DELAY
fi
done
echo "All $MAX_RETRIES attempts failed"
exit 1
#!/bin/bash
# parallel-review.sh -- Run multiple reviews in parallel, aggregate results
set -euo pipefail
RESULTS_DIR="./review-results"
mkdir -p "$RESULTS_DIR"
# Launch parallel reviews
claude -p "Review this codebase for security vulnerabilities. \
Focus on auth, input validation, and data exposure." \
--allowedTools "Read,Grep,Glob" \
--max-turns 10 \
--output-format json \
> "$RESULTS_DIR/security.json" &
PID_SECURITY=$!
claude -p "Review this codebase for performance issues. \
Focus on N+1 queries, memory leaks, and bundle size." \
--allowedTools "Read,Grep,Glob" \
--max-turns 10 \
--output-format json \
> "$RESULTS_DIR/performance.json" &
PID_PERF=$!
claude -p "Review this codebase for code quality issues. \
Focus on complexity, duplication, and naming." \
--allowedTools "Read,Grep,Glob" \
--max-turns 10 \
--output-format json \
> "$RESULTS_DIR/quality.json" &
PID_QUALITY=$!
# Wait for all to complete
wait $PID_SECURITY $PID_PERF $PID_QUALITY
# Aggregate results
SECURITY=$(jq -r '.result' "$RESULTS_DIR/security.json")
PERFORMANCE=$(jq -r '.result' "$RESULTS_DIR/performance.json")
QUALITY=$(jq -r '.result' "$RESULTS_DIR/quality.json")
# Feed aggregated results to a synthesizer
echo "Security Review:
$SECURITY
Performance Review:
$PERFORMANCE
Code Quality Review:
$QUALITY" | claude -p "Synthesize these three code reviews into a single \
prioritized report. Group findings by severity (Critical, High, Medium, Low). \
Deduplicate any overlapping findings." \
--max-turns 3 \
--output-format json | jq -r '.result' > "./review-results/final-report.md"
echo "Combined report: ./review-results/final-report.md"
# Add to crontab with: crontab -e
# Daily code quality check at 6 AM
0 6 * * * cd /path/to/project && /usr/local/bin/claude -p "Run a code quality analysis and write results to ./reports/quality-$(date +\%Y\%m\%d).md" --allowedTools "Read,Write,Grep,Glob" --max-turns 10 >> /var/log/claude-cron.log 2>&1
# Weekly dependency audit on Mondays at 8 AM
0 8 * * 1 cd /path/to/project && /usr/local/bin/claude -p "Audit dependencies for security vulnerabilities and outdated packages. Write report to ./reports/deps-$(date +\%Y\%m\%d).md" --allowedTools "Read,Write,Bash(npm audit *),Bash(npx *),Grep,Glob" --max-turns 10 >> /var/log/claude-cron.log 2>&1
For complex CI/CD tasks, you can use agent teams where multiple Claude instances collaborate:
#!/bin/bash
# ci-agent-team.sh -- Multiple Claude instances working on a PR
set -euo pipefail
PR_DIFF=$(gh pr diff "$1")
# Security review agent
echo "$PR_DIFF" | claude -p "You are a security reviewer. Analyze this PR diff for vulnerabilities." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json > /tmp/security-review.json &
# Performance review agent
echo "$PR_DIFF" | claude -p "You are a performance reviewer. Analyze this PR diff for performance issues." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json > /tmp/perf-review.json &
# Test coverage agent
echo "$PR_DIFF" | claude -p "You are a test coverage analyst. Check if new code has adequate tests." \
--allowedTools "Read,Grep,Glob" \
--max-turns 5 \
--output-format json > /tmp/test-review.json &
wait
# Aggregate and post comment
COMBINED=$(cat /tmp/security-review.json /tmp/perf-review.json /tmp/test-review.json | \
jq -s '[.[].result] | join("\n---\n")')
gh pr comment "$1" --body "## Automated Review\n\n${COMBINED}"
The Agent SDK lets you use Claude Code as a library in your TypeScript or Python applications. It provides the same tools, agent loop, and context management as the CLI, but with programmatic control.
# TypeScript
npm install @anthropic-ai/claude-agent-sdk
# Python
pip install claude-agent-sdk
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find and fix the bug in auth.py",
options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
if ("result" in message) {
console.log(message.result);
}
}
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
async def main():
async for message in query(
prompt="Find and fix the bug in auth.py",
options=ClaudeAgentOptions(allowed_tools=["Read", "Edit", "Bash"]),
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())
import { query } from "@anthropic-ai/claude-agent-sdk";
let sessionId: string | undefined;
// First query: capture the session ID
for await (const message of query({
prompt: "Read the authentication module",
options: { allowedTools: ["Read", "Glob"] }
})) {
if (message.type === "system" && message.subtype === "init") {
sessionId = message.session_id;
}
}
// Resume with full context from the first query
for await (const message of query({
prompt: "Now find all places that call it",
options: { resume: sessionId }
})) {
if ("result" in message) console.log(message.result);
}
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Use the code-reviewer agent to review this codebase",
options: {
allowedTools: ["Read", "Glob", "Grep", "Task"],
agents: {
"code-reviewer": {
description: "Expert code reviewer for quality and security reviews.",
prompt: "Analyze code quality and suggest improvements.",
tools: ["Read", "Glob", "Grep"]
}
}
}
})) {
if ("result" in message) console.log(message.result);
}
import asyncio
from datetime import datetime
from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher
async def log_file_change(input_data, tool_use_id, context):
file_path = input_data.get("tool_input", {}).get("file_path", "unknown")
with open("./audit.log", "a") as f:
f.write(f"{datetime.now()}: modified {file_path}\n")
return {}
async def main():
async for message in query(
prompt="Refactor utils.py to improve readability",
options=ClaudeAgentOptions(
permission_mode="acceptEdits",
hooks={
"PostToolUse": [
HookMatcher(matcher="Edit|Write", hooks=[log_file_change])
]
},
),
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Open example.com and describe what you see",
options: {
mcpServers: {
playwright: { command: "npx", args: ["@playwright/mcp@latest"] }
}
}
})) {
if ("result" in message) console.log(message.result);
}
| Use Case | Best Choice |
|---|---|
| Shell scripts and CI/CD | CLI (claude -p) |
| Custom applications | SDK |
| One-off automation | CLI |
| Production services | SDK |
| Prototyping pipelines | CLI |
| Building agents that spawn agents | SDK |
Write a bash script that:
claude -p with --output-format json to analyze all .ts files for unused importsStretch goal: Use --json-schema to enforce a structured output format.
Write a batch processing script that:
require() statements to import syntaxxargs -PStretch goal: Add a --dry-run flag that uses --permission-mode plan to preview changes without modifying files.
Create a Stop hook configuration that:
TODO comments remain in modified filesstop_hook_active guard to prevent infinite loopsTest it by starting Claude with a task that intentionally leaves TODOs, and verify that the hook catches them.
prd.json with 3 user stories for a simple feature (e.g., a REST API for a todo list)Create a GitHub Actions workflow that:
claude-code-action@v1 to review the PRBuild a 4-stage pipeline:
Each stage should pass context to the next via files in a ./pipeline/ directory. The final report should include cost information from the JSON output of each stage.
Create a log monitoring script that:
Boris Cherny, an engineer at Anthropic who works on Claude Code, has shared several practices for autonomous execution:
The dontAsk mode auto-denies any tool that is not explicitly pre-approved. This is safer than bypassPermissions because you define exactly what is allowed:
claude -p "Implement the feature" \
--permission-mode dontAsk \
--allowedTools "Read,Write,Edit,Bash(npm test *),Bash(npx jest *),Grep,Glob"
If Claude tries to use a tool not in --allowedTools, it is automatically denied without prompting. This prevents unexpected behavior while still allowing Claude to work autonomously with the tools it needs.
Run a separate Claude instance that periodically checks the work of the primary instance:
# Main worker
claude -p "Implement the auth system" \
--allowedTools "Read,Write,Edit,Bash(npm *),Grep,Glob" \
--max-turns 30 &
WORKER_PID=$!
# Background verifier (runs every 2 minutes)
while kill -0 $WORKER_PID 2>/dev/null; do
sleep 120
claude -p "Check the current state of the codebase. \
Run npm test and npm run lint. \
Report any issues found." \
--allowedTools "Read,Bash(npm test *),Bash(npm run lint *),Grep,Glob" \
--max-turns 5 \
--output-format json | jq -r '.result' >> ./verification-log.txt
done
wait $WORKER_PID
echo "Worker finished. Verification log: ./verification-log.txt"
The single most important practice for autonomous execution: always give Claude the tools and commands to verify what it has done. If Claude can run tests, it will run tests. If it cannot, it will guess whether the code works.
# Bad: Claude cannot verify its work
claude -p "Add authentication" --allowedTools "Read,Write,Edit"
# Good: Claude can run tests to verify
claude -p "Add authentication" \
--allowedTools "Read,Write,Edit,Bash(npm test *),Bash(npx tsc --noEmit)"
For projects with browser-based UIs, give Claude access to a browser for visual verification:
claude --chrome -p "Implement the login page and verify it renders correctly"
The --chrome flag enables browser automation, allowing Claude to visually verify UI changes.
Problem: Running a RALF loop or headless command without any iteration limit.
Consequence: If Claude gets stuck on a failing test or an impossible task, it will loop indefinitely, burning through your API budget.
Fix: Always set --max-turns in headless mode and MAX_ITERATIONS in RALF loops.
# Bad
claude -p "Fix all the bugs"
# Good
claude -p "Fix all the bugs" --max-turns 15
Problem: Using --dangerously-skip-permissions on your local machine.
Consequence: Claude has unrestricted access to your entire filesystem and network. A prompt injection attack or a simple misunderstanding could delete files, exfiltrate data, or modify system configuration.
Fix: Only use --dangerously-skip-permissions inside ephemeral containers or VMs. On your local machine, use --allowedTools to scope permissions precisely.
Problem: Running a RALF loop that marks stories as complete based on Claude's claim, without running actual tests.
Consequence: Broken code accumulates. By the time you discover the issues, 10 stories have been "completed" with cascading failures.
Fix: Always run the verification command between iterations and only mark stories complete when verification passes.
Problem: Using --allowedTools "Bash" which allows any bash command.
Consequence: Claude can run rm -rf, curl to external servers, or modify system files. No guardrails.
Fix: Scope bash permissions with prefix matching:
# Bad
--allowedTools "Bash"
# Good
--allowedTools "Bash(npm test *),Bash(npm run *),Bash(git diff *),Read,Edit"
Problem: Running autonomous pipelines without cost controls.
Consequence: An overnight pipeline could consume hundreds of dollars if it enters a retry loop.
Fix: Use --max-budget-usd to cap spending:
claude -p "Run the analysis" --max-budget-usd 5.00 --max-turns 20
Problem: Not checking the exit code from claude -p in scripts.
Consequence: A failed Claude run is treated as a success. Downstream steps execute on broken state.
Fix: Always check exit codes:
if ! claude -p "Run tests" --max-turns 10; then
echo "Claude execution failed"
exit 1
fi
Problem: Running autonomous pipelines without saving Claude's output.
Consequence: When something goes wrong, you have no way to diagnose what happened.
Fix: Always redirect output to log files:
claude -p "Implement feature" \
--output-format json \
--verbose \
> "./logs/run-$(date +%Y%m%d-%H%M%S).json" 2>&1
Problem: Putting an entire project specification into a single prompt and hoping Claude executes it all in one go.
Consequence: Claude loses track of requirements as the context window fills with code it has written. Quality degrades sharply.
Fix: Break the work into stages (multi-stage pipeline) or stories (RALF loop). Each invocation gets a focused task with fresh context.
-pThe sandbox runtime is available as an open source npm package:
npx @anthropic-ai/sandbox-runtime <command-to-sandbox>
Source: github.com/anthropic-experimental/sandbox-runtime
Last Updated: 2026-02-20 Compiled from official Anthropic documentation, Claude Agent SDK docs, and community best practices
New techniques, real-world patterns, and Claude updates — delivered as the guides evolve.