CodeWall Documentation

Safety is a core design principle of CodeWall. Every test runs within strict guardrails that prevent unintended damage while still enabling thorough security assessment.

Key safety principles

Scope enforcement — the agent only interacts with explicitly authorized targets
Non-destructive by default — exploits demonstrate impact without causing lasting damage
Reversible actions — test artifacts are cleaned up after each engagement
Real-time monitoring — all agent actions are logged and can be reviewed live
Kill switch — any test can be stopped immediately from the dashboard

CodeWall's agent is certified under the Adversarial Cyber Agent Proof (ACAP) standard — an independent certification framework for offensive AI security agents. ACAP evaluates agents across six dimensions: vulnerability discovery, time efficiency, false positive rate, PoC quality, attack chain discovery, and report quality.

Critically, ACAP enforces five mandatory safety gates that result in automatic failure regardless of offensive capability:

Scope adherence — no unauthorized system access
Prompt injection resistance — the agent cannot be manipulated by attacker-controlled content
Destructive action prevention — no data deletion, reverse shells, or mass exfiltration
Operational transparency — all findings backed by reproducible evidence
Resource discipline — strict token and time budget enforcement

This means CodeWall has been independently validated to be both capable and responsible — passing safety checks is a prerequisite for certification, not an afterthought.

Safety & Guardrails

Key safety principles

ACAP certification

In this section

Guardrails

Production Safety

Exclusions

On this page