CodeWall Documentation

How CodeWall uses hypotheses to drive targeted, methodical vulnerability discovery.

Hypotheses are the core reasoning mechanism behind CodeWall's penetration testing. Rather than running a static checklist of scans, CodeWall's agent formulates hypotheses about potential vulnerabilities in your target — then designs specific tests to confirm or reject each one.

What is a hypothesis?

A hypothesis is a structured statement about a suspected vulnerability. Each hypothesis includes:

Field	Description
Statement	A clear, testable claim — e.g., "The `/api/users` endpoint is vulnerable to IDOR via predictable user IDs"
Severity	The expected impact if confirmed (Critical, High, Medium, Low)
Family	The vulnerability class — auth, injection, XSS, misconfiguration, memory-safety, etc.
Confidence	How likely the agent believes this hypothesis is to be true (0–100%)
Rationale	Why the agent suspects this vulnerability exists
Preconditions	What must be true for the vulnerability to be exploitable
Proposed checks	The specific tests the agent plans to run

Why hypotheses matter

Traditional scanners work by firing hundreds of generic payloads and checking for known signatures. CodeWall works differently — it reasons about your application's specific architecture, technology stack, and behaviour to form targeted hypotheses.

This approach has several advantages:

Fewer false positives — the agent only tests what it has reason to suspect, rather than spraying payloads
Deeper coverage — hypothesis-driven testing catches logic flaws, business logic vulnerabilities, and chained attack paths that signature-based scanners miss
Transparency — you can see exactly what the agent is thinking and why, not just a list of CVEs
Efficiency — the agent spends its budget on the most promising attack vectors rather than exhaustive enumeration

The hypothesis lifecycle

Formulation — during the analysis phase, the agent reviews reconnaissance data and formulates hypotheses about potential vulnerabilities
Prioritisation — hypotheses are ranked by severity and confidence, so the most impactful and likely vulnerabilities are tested first
Validation — during the validate phase, the agent runs the proposed checks for each hypothesis
Outcome — each hypothesis is marked as verified (vulnerability confirmed), rejected (not exploitable), not tested (skipped due to budget or prerequisites), or error (test failed to execute)

Verified hypotheses become findings with full proof-of-concept evidence and remediation guidance.

Adding your own hypotheses

When approval gates are enabled, you can inject your own hypotheses into a running test before the analysis phase completes. This is powerful for:

Domain knowledge — you know your application better than any scanner. If you suspect a specific endpoint is vulnerable, tell the agent to test it
Regression testing — add hypotheses for previously fixed vulnerabilities to verify they haven't regressed
Compliance checks — inject hypotheses for specific compliance requirements your organisation must meet
Red team scenarios — guide the agent toward specific attack paths you want validated

How to add a hypothesis

Navigate to your running test
Switch to the Hypotheses tab
Fill in the statement, severity, family, and optional rationale
Click Add Hypothesis

Your hypothesis is queued alongside the agent's own hypotheses and will be tested during the validate and exploit phases. The agent treats operator-submitted hypotheses with the same rigour as its own — designing specific test cases, executing them, and reporting the outcome.

When you can add hypotheses

You can add hypotheses while the test is in the preflight, recon, or analysis phases. If approval gates are enabled, you can also add them while the test is awaiting approval for the recon or analysis phase. Once the agent moves into the validate phase, the hypothesis list is locked.

Retesting hypotheses

If a hypothesis was marked as not tested or error, you can trigger a targeted retest directly from the Hypotheses tab. This launches a new focused test that only validates that specific hypothesis, without re-running the full engagement.

Hypotheses