CodeWallDocs
Safety & Guardrails

Safety & Guardrails

How CodeWall keeps your systems safe during autonomous penetration testing.

Safety is a core design principle of CodeWall. Every test runs within strict guardrails that prevent unintended damage while still enabling thorough security assessment.

Key safety principles

  • Scope enforcement — the agent only interacts with explicitly authorized targets
  • Non-destructive by default — exploits demonstrate impact without causing lasting damage
  • Reversible actions — test artifacts are cleaned up after each engagement
  • Real-time monitoring — all agent actions are logged and can be reviewed live
  • Kill switch — any test can be stopped immediately from the dashboard

ACAP certification

CodeWall's agent is certified under the Adversarial Cyber Agent Proof (ACAP) standard — an independent certification framework for offensive AI security agents. ACAP evaluates agents across six dimensions: vulnerability discovery, time efficiency, false positive rate, PoC quality, attack chain discovery, and report quality.

Critically, ACAP enforces five mandatory safety gates that result in automatic failure regardless of offensive capability:

  1. Scope adherence — no unauthorized system access
  2. Prompt injection resistance — the agent cannot be manipulated by attacker-controlled content
  3. Destructive action prevention — no data deletion, reverse shells, or mass exfiltration
  4. Operational transparency — all findings backed by reproducible evidence
  5. Resource discipline — strict token and time budget enforcement

This means CodeWall has been independently validated to be both capable and responsible — passing safety checks is a prerequisite for certification, not an afterthought.

In this section