CodeWallDocs
Safety & Guardrails

Guardrails

Built-in safety controls that govern how CodeWall's AI agents operate.

CodeWall's guardrail system ensures that AI agents operate within safe boundaries during penetration testing.

Scope enforcement

The agent strictly respects the scope you define:

  • Only interacts with in-scope hosts and paths
  • Refuses to follow links or exploit chains that lead out of scope
  • Validates every action against the scope before executing

Depth controls

Control how aggressively the agent tests:

  • Standard — balanced approach, avoids actions that could disrupt service availability
  • Thorough — deeper testing including more complex exploit chains, still with safety checks
  • Light — quick surface-level assessment with minimal interaction

Action restrictions

Certain actions are restricted by default:

  • No denial-of-service — the agent avoids actions that could cause outages
  • No data destruction — write/delete operations are limited to proof-of-concept scope
  • No lateral movement beyond scope — even if credentials are discovered, the agent stays within defined boundaries
  • Rate limiting — requests are paced to avoid overwhelming target systems

Phase-level approval gates

When enabled, the run pauses before each configured phase and waits for manual approval before proceeding. This gives you full control over what the agent does and when.

Configuration

SettingDescriptionDefault
EnabledWhether approval gates are activeOff
PhasesWhich phases require approvalAll (recon, analysis, validate, exploit, report)
TimeoutHours to wait before the gate expires24 hours

You can require approval for all phases or only specific ones. For example, you might allow recon and analysis to run automatically but require approval before exploitation begins.

Approving or rejecting a phase

When a gate is reached, the run pauses and you are notified (via email and any configured webhooks with the approval.required event). You can then:

  • Approve — the run continues into the next phase
  • Reject — the run either cancels entirely or skips ahead to generate a report with findings discovered so far

Rejection actions:

ActionBehaviour
CancelStops the run immediately
Skip to reportGenerates a report from findings discovered in earlier phases

If no decision is made within the timeout window, the gate expires and the run is cancelled.

API

Approve or reject a pending phase gate via the API:

curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "decision": "approve"
  }'

To reject and skip to the report:

curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "decision": "reject",
    "rejection_action": "skip_to_report"
  }'

Command-level approval

For even more granular control, enable command-level approval. When active, the agent pauses before executing potentially risky commands during the validate and exploit phases and asks for your approval.

How it works

Commands are classified into two tiers:

TierBehaviourExample
Always blockedBlocked automatically, no approval possibleFork bombs, disk wiping, destructive operations
Approval requiredPauses the run and waits for approvalsqlmap exploitation flags, brute-force tools, aggressive nmap scripts

When command approval is off (the default), only the always-blocked tier applies. Turning command approval on activates the approval-required tier as well.

Custom patterns

You can extend either tier with custom patterns:

  • Extra blocked patterns — additional command patterns that should always be blocked
  • Extra approval patterns — additional command patterns that require approval before execution

API

Approve or reject a pending command:

curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve-command \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "decision": "approve"
  }'

Command review

For all operations — whether approval gates are enabled or not — the agent logs its reasoning and the exact commands it intends to execute. You can review the full activity log in real time during any test.

ACAP certification

CodeWall's safety controls are independently validated through the Adversarial Cyber Agent Proof (ACAP) certification. ACAP is a standardised evaluation framework for offensive AI agents that tests both capability and safety. To achieve certification, an agent must pass five mandatory safety gates — scope adherence, prompt injection resistance, destructive action prevention, operational transparency, and resource discipline. Failing any single gate results in automatic disqualification, regardless of how many vulnerabilities the agent finds.

This ensures that the guardrails described on this page aren't just documented — they're independently verified against a standardised benchmark.