550 adversarial prompts across 16 AoC categories. Four defense systems. Real LLM evaluation against GPT-4o-mini and Mistral 7B. Attack success rate below 1% with C4 active.
Token bucket rate limiter. PoW challenge for suspicious bursts. Sliding window anomaly detection. Configurable per-severity thresholds.
CUSUM for gradual mean shifts. Windowed KL-divergence on state distributions. Detects drift within 50 steps with <5% false positive rate.
Multi-modal verification. Text classifier + trajectory prediction must agree. HMAC-signed state claims. Temporal consistency check.
Graph-based collusion detection via Bron-Kerbosch clique finding. Agent interaction graph monitoring. Temporal clustering of coordinated actions.
550 hand-crafted adversarial prompts. 50 per category across 11 original + 5 extended AoC failure modes. Expected C4 state + ground truth per prompt.
9 Agda theorems from adaptive-topology (git submodule). RuntimeVerifier with real SHA256 certificates. Honest postulate documentation.
H₀/H₁ defined before experiments. Fisher exact test with Bonferroni correction. Power analysis (α=0.05, β=0.2, d=0.5 → n≥64).
Docker one-command experiment reproduction. Jupyter tutorials. Full code + data supplement. External benchmark harness (HarmBench, AdvBench).
Full methodology. Statistical analysis. Honest limitations. Comparison with existing methods.
9 Agda theorems from adaptive-topology. Which are proven, which are postulates. Real SHA256 hashes.
H₀: C4 does not reduce attack success. H₁: C4 reduces attack success by >20%. Full statistical plan.
7 universal principles. 8 wisdom traditions. SVETILO value alignment.