🐝 Open Source · Copilot CLI
0AI Agents

Launch up to 250 AI agents
across 15 models. Find consensus no single model can.

Multi-model consensusΒ·Cross-validatedΒ·Shadow-scored
⭐ View on GitHub β†’
swarm-command
$ swarm command --scale 250 "audit auth system"

🐝 Hive activated · 250 agents · 15 models · 3 families
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ consensus: 94%

  βœ“ Cross-family validation passed
  βœ“ Shadow score: 96/100
  βœ“ 3 critical findings synthesized

  β†’ Final report delivered in 4m 12s

One model, one perspective.
That's fragile.

For small tasks, a single AI is fine. But for security audits, architecture reviews, and migration strategies β€” one model means one blind spot, one context window, one confident-sounding answer with no independent check. You need consensus from independent minds that verify each other's work.

How the hive works

1

Describe your task

One command. Tell the swarm what you need β€” security audit, code review, architecture analysis. Plain English.

2

The swarm fans out

Agents from Claude and GPT families compete and collaborate. Different models cross-pollinate and review each other's work.

3

Consensus delivers

Only findings validated across model families survive. Shadow scores gate quality. One synthesized answer emerges from the colony.

What makes the hive different

🐝

Collective Intelligence

15 models, not one. Claude Opus & Sonnet. GPT-5.x series. Claude Haiku. Each brings different strengths β€” together they catch what any single model misses.

πŸ”

Cross-Validated

Different model families review each other's work. Claude checks GPT. GPT checks Claude. No echo chambers β€” only findings that survive independent scrutiny make the cut.

πŸ”’

Shadow Scored

Hidden quality gates you can't game. Every agent is scored β€” failuresΒ Γ·Β totalΒ Γ—Β 100 β€” and they don't know they're being watched. Bad work gets caught automatically.

The Spawn Hierarchy

Every commander runs in its own context window.
Different model families ensure diverse perspectives.

🐝
YOU"audit my codebase"
🎯CMD-1
Opus 4.6Own Context
~50 workers
🎯CMD-2
GPT-5.2Own Context
~50 workers
🎯CMD-3
Sonnet 4Own Context
~50 workers
🎯CMD-4
GPT-5.4Own Context
~50 workers
🎯CMD-5
Sonnet 4.5Own Context
~50 workers

5 Commanders Γ— ~50 workers each = 250 agents, each with its own context window

🧠Workers are leaf agents β€” explore for research, task for execution
πŸ”€Cross-family reviewers validate outputs across model boundaries

Consensus Across Models

Multiple independent minds converge on one synthesized truth.

Opus 4.6Analysis #1
GPT-5.2Analysis #2
Sonnet 4Analysis #3
GPT-5.4Analysis #4
Sonnet 4.5Analysis #5
Convergence
⬑SynthesizedResult
βœ…
3+ models agreeCONSENSUS
🟑
2 models agreeMAJORITY
⚠️
1 unique findingFLAGGED

Scale to your mission

~89 agentsDeep Audit

Thorough analysis with full cross-family validation. Architecture reviews, security audits, migration planning.

$ swarm command --scale 100 "audit security posture"

250 agents. Under $20.

Every layer of the swarm is engineered to maximize signal while minimizing spend. Here's how.

πŸ“¦

1024:1 Token Compression

Context shrinks at every layer β€” 128KΒ tokens at the Nexus compresses to just 128Β tokens at each worker. Parents strip rationale, narrow file scope, and tighten constraints so children only receive the bytes they need.

⚑

Circuit Breakers

A three-state FSM (Closed β†’ Open β†’ Half-Open) monitors every layer. If 50-60% of agents fail, the breaker trips β€” no new agents spawn, costs stop climbing, and a recovery probe tests before the swarm resumes.

πŸ›‘οΈ

Six Resource Guards

Timeout cascade (90β†’60β†’40β†’30s), token ceilings per layer, output size caps, retry budgets, a concurrent-agent cap of 50, and a hard cost ceiling ($5–$20 depending on scale) that kills all agents if breached.

🌊

Wave Deployment

Agents launch in three waves β€” CanaryΒ (1), ProbeΒ (3), Remainder β€” with health gates between each. If the canary fails, the full pod never deploys. One cheap test prevents many expensive failures.

🐝

Cheap Workers, Smart Leaders

Workers use Haiku and GPT-Mini β€” the lightest, cheapest models. Expensive Opus and Sonnet reasoning is reserved for Commanders and the Nexus where it matters most. 60% of agents cost 10Γ— less.

πŸ“Š

Predictable Pricing

SS-50 runs $1.50–$3.50. SS-100 runs $3.50–$8. SS-250 runs $8–$16. Hard ceilings at $5, $10, and $20 guarantee you never get a surprise bill β€” even if every agent retries at maximum.

ScaleAgentsTypical CostHard CapWall-Clock
SS-50~36-52$2.50$5~30s
SS-100~89$5.50$10~45s
SS-250~316$10$20~65–90s

Proof from the hive

0+agents deployedacross real production sessions
0models availableClaude Β· GPT
0critical vulns foundthat single models missed

Progressive refinement: discover β†’ validate β†’ confirm

consensus = confidenceΓ—0.40 + evidenceΓ—0.30 + scopeΓ—0.15 + coverageΓ—0.15 βˆ’ conflict_penalty

Join the hive 🐝

One command. Then type swarm command.

curl -fsSL https://raw.githubusercontent.com/DUBSOpenHub/swarm-command/main/quickstart.sh | bash

Requires an active Copilot subscription