Guardrails
Built-in safety controls that govern how CodeWall's AI agents operate.
CodeWall's guardrail system ensures that AI agents operate within safe boundaries during penetration testing.
Scope enforcement
The agent strictly respects the scope you define:
- Only interacts with in-scope hosts and paths
- Refuses to follow links or exploit chains that lead out of scope
- Validates every action against the scope before executing
Depth controls
Control how aggressively the agent tests:
- Standard — balanced approach, avoids actions that could disrupt service availability
- Thorough — deeper testing including more complex exploit chains, still with safety checks
- Light — quick surface-level assessment with minimal interaction
Action restrictions
Certain actions are restricted by default:
- No denial-of-service — the agent avoids actions that could cause outages
- No data destruction — write/delete operations are limited to proof-of-concept scope
- No lateral movement beyond scope — even if credentials are discovered, the agent stays within defined boundaries
- Rate limiting — requests are paced to avoid overwhelming target systems
Phase-level approval gates
When enabled, the run pauses before each configured phase and waits for manual approval before proceeding. This gives you full control over what the agent does and when.
Configuration
| Setting | Description | Default |
|---|---|---|
| Enabled | Whether approval gates are active | Off |
| Phases | Which phases require approval | All (recon, analysis, validate, exploit, report) |
| Timeout | Hours to wait before the gate expires | 24 hours |
You can require approval for all phases or only specific ones. For example, you might allow recon and analysis to run automatically but require approval before exploitation begins.
Approving or rejecting a phase
When a gate is reached, the run pauses and you are notified (via email and any configured webhooks with the approval.required event). You can then:
- Approve — the run continues into the next phase
- Reject — the run either cancels entirely or skips ahead to generate a report with findings discovered so far
Rejection actions:
| Action | Behaviour |
|---|---|
| Cancel | Stops the run immediately |
| Skip to report | Generates a report from findings discovered in earlier phases |
If no decision is made within the timeout window, the gate expires and the run is cancelled.
API
Approve or reject a pending phase gate via the API:
curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"decision": "approve"
}'To reject and skip to the report:
curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"decision": "reject",
"rejection_action": "skip_to_report"
}'Command-level approval
For even more granular control, enable command-level approval. When active, the agent pauses before executing potentially risky commands during the validate and exploit phases and asks for your approval.
How it works
Commands are classified into two tiers:
| Tier | Behaviour | Example |
|---|---|---|
| Always blocked | Blocked automatically, no approval possible | Fork bombs, disk wiping, destructive operations |
| Approval required | Pauses the run and waits for approval | sqlmap exploitation flags, brute-force tools, aggressive nmap scripts |
When command approval is off (the default), only the always-blocked tier applies. Turning command approval on activates the approval-required tier as well.
Custom patterns
You can extend either tier with custom patterns:
- Extra blocked patterns — additional command patterns that should always be blocked
- Extra approval patterns — additional command patterns that require approval before execution
API
Approve or reject a pending command:
curl -X POST https://api.codewall.ai/v1/runs/:run_id/approve-command \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"decision": "approve"
}'Command review
For all operations — whether approval gates are enabled or not — the agent logs its reasoning and the exact commands it intends to execute. You can review the full activity log in real time during any test.
ACAP certification
CodeWall's safety controls are independently validated through the Adversarial Cyber Agent Proof (ACAP) certification. ACAP is a standardised evaluation framework for offensive AI agents that tests both capability and safety. To achieve certification, an agent must pass five mandatory safety gates — scope adherence, prompt injection resistance, destructive action prevention, operational transparency, and resource discipline. Failing any single gate results in automatic disqualification, regardless of how many vulnerabilities the agent finds.
This ensures that the guardrails described on this page aren't just documented — they're independently verified against a standardised benchmark.

