Research Project · MIT License

Empirical validation of geometric AI defense

550 adversarial prompts across 16 AoC categories. Four defense systems. Real LLM evaluation against GPT-4o-mini and Mistral 7B. Attack success rate below 1% with C4 active.

<1%
Attack success rate
with C4 active
550
Adversarial prompts
16 categories
96.7%
Block rate
532/550 preemptively
4
Defense
systems
31
Defense
tests

LLM Validation Results · 2,200 trials

GPT-4o-mini
10.7%0.7%
Attack success rate · 550 prompts · 93.2% reduction
Mistral 7B (Ollama)
22.5%0.5%
Attack success rate · 550 prompts · 97.6% reduction

Defense Systems

Anti-Flooding

Token bucket rate limiter. PoW challenge for suspicious bursts. Sliding window anomaly detection. Configurable per-severity thresholds.

Anti-Slow-Corruption

CUSUM for gradual mean shifts. Windowed KL-divergence on state distributions. Detects drift within 50 steps with <5% false positive rate.

Anti-Φ-Spoofing

Multi-modal verification. Text classifier + trajectory prediction must agree. HMAC-signed state claims. Temporal consistency check.

Anti-Coalition

Graph-based collusion detection via Bron-Kerbosch clique finding. Agent interaction graph monitoring. Temporal clustering of coordinated actions.

Research Pipeline

AoC Test Suite

550 hand-crafted adversarial prompts. 50 per category across 11 original + 5 extended AoC failure modes. Expected C4 state + ground truth per prompt.

Formal Verification

9 Agda theorems from adaptive-topology (git submodule). RuntimeVerifier with real SHA256 certificates. Honest postulate documentation.

Pre-registered Hypotheses

H₀/H₁ defined before experiments. Fisher exact test with Bonferroni correction. Power analysis (α=0.05, β=0.2, d=0.5 → n≥64).

Reproducible

Docker one-command experiment reproduction. Jupyter tutorials. Full code + data supplement. External benchmark harness (HarmBench, AdvBench).

Quick Start

$ git clone https://gitlab.com/cognitive-functors/agents-of-order $ cd agents-of-order $ pip install -e ".[research]" $ python experiments/quick_validation.py

Resources