CodeWall Documentation

CodeWall's guardrail system ensures that AI agents operate within safe boundaries during penetration testing.

Scope enforcement

The agent strictly respects the scope you define:

Only interacts with in-scope hosts and paths
Refuses to follow links or exploit chains that lead out of scope
Validates every action against the scope before executing

Depth controls

Control how aggressively the agent tests:

Standard — balanced approach, avoids actions that could disrupt service availability
Thorough — deeper testing including more complex exploit chains, still with safety checks
Light — quick surface-level assessment with minimal interaction

Action restrictions

Certain actions are restricted by default:

No denial-of-service — the agent avoids actions that could cause outages
No data destruction — write/delete operations are limited to proof-of-concept scope
No lateral movement beyond scope — even if credentials are discovered, the agent stays within defined boundaries
Rate limiting — requests are paced to avoid overwhelming target systems

Phase-level approval gates

When enabled, the run pauses before each configured phase and waits for manual approval before proceeding. This gives you full control over what the agent does and when.

Configuration

Setting	Description	Default
Enabled	Whether approval gates are active	Off
Phases	Which phases require approval	All (recon, analysis, validate, exploit, report)
Timeout	Hours to wait before the gate expires	24 hours

You can require approval for all phases or only specific ones. For example, you might allow recon and analysis to run automatically but require approval before exploitation begins.

Approving or rejecting a phase

When a gate is reached, the run pauses and you are notified (via email and any configured webhooks with the approval.required event). You can then:

Approve — the run continues into the next phase
Reject — the run either cancels entirely or skips ahead to generate a report with findings discovered so far

Rejection actions:

Action	Behaviour
Cancel	Stops the run immediately
Skip to report	Generates a report from findings discovered in earlier phases

If no decision is made within the timeout window, the gate expires and the run is cancelled.

API

Approve or reject a pending phase gate via the API:

curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "decision": "approve"
  }'

To reject and skip to the report:

curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "decision": "reject",
    "rejection_action": "skip_to_report"
  }'

For even more granular control, enable command-level approval. When active, the agent pauses before executing potentially risky commands during the validate and exploit phases and asks for your approval.

How it works

Commands are classified into two tiers:

Tier	Behaviour	Example
Always blocked	Blocked automatically, no approval possible	Fork bombs, disk wiping, destructive operations
Approval required	Pauses the run and waits for approval	sqlmap exploitation flags, brute-force tools, aggressive nmap scripts

When command approval is off (the default), only the always-blocked tier applies. Turning command approval on activates the approval-required tier as well.

Custom patterns

You can extend either tier with custom patterns:

Extra blocked patterns — additional command patterns that should always be blocked
Extra approval patterns — additional command patterns that require approval before execution

API

Approve or reject a pending command:

curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve-command \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "decision": "approve"
  }'

Command review

For all operations — whether approval gates are enabled or not — the agent logs its reasoning and the exact commands it intends to execute. You can review the full activity log in real time during any test.

ACAP certification

CodeWall's safety controls are independently validated through the Adversarial Cyber Agent Proof (ACAP) certification. ACAP is a standardised evaluation framework for offensive AI agents that tests both capability and safety. To achieve certification, an agent must pass five mandatory safety gates — scope adherence, prompt injection resistance, destructive action prevention, operational transparency, and resource discipline. Failing any single gate results in automatic disqualification, regardless of how many vulnerabilities the agent finds.

This ensures that the guardrails described on this page aren't just documented — they're independently verified against a standardised benchmark.

Guardrails