AGENTIC ยท LIGHTS OUT ยท SEALED-ENVELOPE TESTING
The agentic dark factory for AI building. Turn a short free-text goal into a production-grade pull request โ six specialist agents orchestrated through a checkpoint-gated pipeline with sealed-envelope testing. Builders never see the hidden tests. The shadow score reveals the truth.
โ๏ธ One command to install ยท Paste that into the Copilot CLI and start building
๐ Created with ๐ by Gregg Cochran @DUBSOpenHub with the GitHub Copilot CLI
12 acceptance criteriasrc/ tests/ READMESHADOW_SCORE: 0% โ all sealed tests pass โDark Factory orchestrates a team of specialist agents through a sealed-envelope pipeline. Every build is measured. Every gap is quantified by the shadow score.
Specialist AI roles
Checkpoint-gated pipeline
Automatic fix loops
Perfect sealed coverage
Shadow Score = sealed test failures รท total sealed tests. It measures how much the builder missed when they couldn't see the hidden acceptance suite.
Dark Factory is an agentic build system โ it isolates every build in a disposable git worktree, orchestrates six specialist AI agents, and measures quality with sealed-envelope testing. The shadow score tells you exactly how much the builder missed. Lights out means the builder works blind โ and the hidden tests prove whether the spec was truly covered.
# what powers the factory factory: default_mode: full max_hardening_cycles: 3 agent_timeout_sec: 300 models: product_mgr: claude-sonnet-4.6 architect: claude-sonnet-4.6 qa_sealed: claude-sonnet-4.6 lead_eng: claude-sonnet-4.6 qa_validator: claude-haiku-4.5 premium_model: claude-opus-4.6 checkpoints: allow_skip_all: true
Write a sentence. Get production code. Dark Factory takes a free-text goal and produces a complete pull request โ spec, architecture, implementation, tests, and delivery report.
QA writes hidden acceptance tests before any code exists. The builder never sees them. Shadow scores quantify blind spots the builder didn't know about.
Each phase has its own expert. Product Manager, Architect, QA Sealed, Lead Engineer, QA Validator, and Outcome Evaluator โ stateless, focused, governed.
You stay in control. Human approval gates at every phase boundary. Review the PRD, approve the architecture, inspect the build โ or go fully dark with skip-all.
Every phase checkpoints to state.json. Network drops, timeouts, interrupted sessions โ just run dark factory resume and continue.
Short goals get fast builds. Express mode skips PRD and architecture, but still runs sealed QA from the raw goal. Quick fixes get the same quality envelope.
Seven phases, each with its own specialist agent. Work flows forward through checkpoints. Express mode condenses to three phases.
Creates an isolated git worktree and branch. No interference with your working tree.
Factory ManagerProduct Manager writes PRD.md with acceptance criteria, user stories, and scope.
Product ManagerHidden acceptance tests written from the PRD. SHA-256 hashed. Builder never sees them.
QA SealedSystem design โ diagrams, contracts, tech decisions. Runs in parallel with QA Sealed.
ArchitectLead Engineer implements the spec and writes their own tests. Can't see the sealed suite.
Lead EngineerSealed tests injected temporarily. Shadow score reveals the gap. Auto-hardening if needed.
QA ValidatorAutomatic fix cycles when shadow score > 0%. Builder sees failures but never sealed tests.
Lead EngineerFinal human checkpoint. Delivery report with shadow score. Approve, modify, or reject.
Factory ManagerOptional post-ship assessment. PRD criteria fulfillment and KPI scoring out of 100.
Outcome Evaluator.factory/runs/run-20260325-0900[approve] [modify] [abort]SHA-256: a7f3... โ stored in .factory/sealed/src/middleware.ts, src/store.ts, 14 unit tests passing.SHADOW_SCORE: 0% โ all sealed tests pass โSix specialist agents, each with its own prompt, model assignment, and governance rules. Stateless โ they only see what the Factory Manager passes them.
Turns your free-text goal into a structured PRD with acceptance criteria, user stories, and technical constraints. Capped at 180 lines.
model: claude-sonnet-4.6Designs the system from the PRD โ diagrams, contracts, file structure, tech decisions. Runs in parallel with QA Sealed.
model: claude-sonnet-4.6Writes acceptance tests from the PRD that the builder will never see. Tests are SHA-256 hashed and stored in a sealed vault.
model: claude-sonnet-4.6Implements the spec, writes tests, and handles hardening cycles. Only sees failure messages from sealed tests โ never the tests themselves.
model: claude-sonnet-4.6Temporarily injects sealed tests into the worktree, runs them, reports the shadow score, then removes all traces.
model: claude-haiku-4.5Revisits archived artifacts after delivery. Scores PRD criteria fulfillment and KPI alignment out of 100.
model: claude-sonnet-4.6QA writes tests before code exists and hides them from the builder. The quality gap between what the builder tests and what the sealed suite catches is your shadow score โ a blind-spot metric you can't game.
QA Sealed writes tests using only the PRD โ before any code exists. Tests are SHA-256 hashed and stored in .factory/sealed/.
During Phase 4, sealed tests are temporarily injected, executed, and immediately removed. The builder only ever sees failure messages โ never the test source.
Shadow score = sealed failures รท sealed total. 0% means the builder nailed it blind. >25% signals spec/test misalignment.
Shadow Score Spec โPrevents overfitting. Builders can't "teach to the test" because they never see the sealed suite.
Quantifies quality. Shadow scores expose blind spots numerically โ not subjectively.
Automates escalation. Hardening cycles fire automatically when sealed tests fail. Up to 3 cycles before human decision.
Retains speed. Express mode derives sealed tests from the raw goal text, so even quick fixes get quality coverage.
Six commands. Everything else is handled by the pipeline automatically.
dark factory โ <goal>Complete pipeline with all 7 phases and checkpoints at every gate. The "lights out" experience.
dark factory express โ <goal>Skips PRD/Architecture. Sealed QA still runs from the raw goal. One checkpoint at delivery.
dark factory resumeReloads state.json and continues from the saved phase. Crash recovery built in.
dark factory statusPrints current state without mutating anything. Shows pending evaluations.
dark factory evaluate <run-id>Launches Phase 7 Outcome Evaluator for a delivered run. KPI scoring out of 100.
dark factory premium โ <goal>Routes all agents through claude-opus-4.6 for one run.
Add Skill
/skills add DUBSOpenHub/dark-factoryDescribe
dark factory โ <your goal>Ship
approve at delivery checkpoint