Research Project · MIT License

Empirical validation of geometric AI defense

550 adversarial prompts across 16 AoC categories. Four defense systems. Real LLM evaluation against GPT-4o-mini and Mistral 7B. Attack success rate below 1% with C4 active.

<1%

Attack success rate
with C4 active

550

Adversarial prompts
16 categories

96.7%

Block rate
532/550 preemptively

Defense
systems

Defense
tests

LLM Validation Results · 2,200 trials

GPT-4o-mini

10.7%→0.7%

Attack success rate · 550 prompts · 93.2% reduction

Mistral 7B (Ollama)

22.5%→0.5%

Attack success rate · 550 prompts · 97.6% reduction

Defense Systems

⊡

Anti-Flooding

Token bucket rate limiter. PoW challenge for suspicious bursts. Sliding window anomaly detection. Configurable per-severity thresholds.

⊕

Anti-Slow-Corruption

CUSUM for gradual mean shifts. Windowed KL-divergence on state distributions. Detects drift within 50 steps with <5% false positive rate.

⊛

Anti-Φ-Spoofing

Multi-modal verification. Text classifier + trajectory prediction must agree. HMAC-signed state claims. Temporal consistency check.

◈

Anti-Coalition

Graph-based collusion detection via Bron-Kerbosch clique finding. Agent interaction graph monitoring. Temporal clustering of coordinated actions.

Research Pipeline

▣

AoC Test Suite

550 hand-crafted adversarial prompts. 50 per category across 11 original + 5 extended AoC failure modes. Expected C4 state + ground truth per prompt.

Formal Verification

9 Agda theorems from adaptive-topology (git submodule). RuntimeVerifier with real SHA256 certificates. Honest postulate documentation.

⬡

Pre-registered Hypotheses

H₀/H₁ defined before experiments. Fisher exact test with Bonferroni correction. Power analysis (α=0.05, β=0.2, d=0.5 → n≥64).

⚡

Reproducible

Docker one-command experiment reproduction. Jupyter tutorials. Full code + data supplement. External benchmark harness (HarmBench, AdvBench).

Quick Start

$ git clone https://gitlab.com/cognitive-functors/agents-of-order $ cd agents-of-order $ pip install -e ".[research]" $ python experiments/quick_validation.py