Agentic Build System ยท Codename: Lights Out

dark
factory.

AGENTIC ยท LIGHTS OUT ยท SEALED-ENVELOPE TESTING

The agentic dark factory for AI building. Turn a short free-text goal into a production-grade pull request โ€” six specialist agents orchestrated through a checkpoint-gated pipeline with sealed-envelope testing. Builders never see the hidden tests. The shadow score reveals the truth.

Goal โ†’๐Ÿ“‹ PRDโ†’ ๐Ÿ”’ Sealed QAโ†’ ๐Ÿ—๏ธ Archโ†’ โš™๏ธ Buildโ†’ ๐Ÿงช Validateโ†’ โœ… Ship
View on GitHub โ†’

โ˜๏ธ One command to install ยท Paste that into the Copilot CLI and start building

๐Ÿ™ Created with ๐Ÿ’œ by Gregg Cochran @DUBSOpenHub with the GitHub Copilot CLI

dark factory โ€” copilot cli
youdark factory โ€” build a CLI that audits deps for GPL
factory๐Ÿญ Run: run-20260401 | Mode: FULL
pmPRD.md written โ€” 12 acceptance criteria
qa ๐Ÿ”’Sealed tests hashed. Builders won't see these.
engBuild complete โ€” src/ tests/ README
factory๐Ÿญ SHADOW_SCORE: 0% โ€” all sealed tests pass โœ“
factoryDelivery checkpoint โ€” approve?
โ€บ

6 agents, 7 phases, zero blind spots

Dark Factory orchestrates a team of specialist agents through a sealed-envelope pipeline. Every build is measured. Every gap is quantified by the shadow score.

0
Agents

Specialist AI roles

0
Phases

Checkpoint-gated pipeline

0
Hardening Cycles

Automatic fix loops

0%
Target Shadow Score

Perfect sealed coverage

๐Ÿ”’ What is Shadow Score?

Shadow Score = sealed test failures รท total sealed tests. It measures how much the builder missed when they couldn't see the hidden acceptance suite.

0% โ€” perfect blind coverageโ‰ค10% โ€” team target>25% โ€” spec/test misalignment

Agentic
lights out builds

Dark Factory is an agentic build system โ€” it isolates every build in a disposable git worktree, orchestrates six specialist AI agents, and measures quality with sealed-envelope testing. The shadow score tells you exactly how much the builder missed. Lights out means the builder works blind โ€” and the hidden tests prove whether the spec was truly covered.

๐Ÿ”’
Sealed Tests
๐ŸŒฟ
Git Worktree
๐Ÿ“‹
Checkpoints
โšก
Express Mode
config.yml
# what powers the factory
factory:
  default_mode: full
  max_hardening_cycles: 3
  agent_timeout_sec: 300

models:
  product_mgr: claude-sonnet-4.6
  architect: claude-sonnet-4.6
  qa_sealed: claude-sonnet-4.6
  lead_eng: claude-sonnet-4.6
  qa_validator: claude-haiku-4.5
  premium_model: claude-opus-4.6

checkpoints:
  allow_skip_all: true

Agentic lights
out builds.

01

Goal in, PR out

Write a sentence. Get production code. Dark Factory takes a free-text goal and produces a complete pull request โ€” spec, architecture, implementation, tests, and delivery report.

02

Sealed-envelope testing

QA writes hidden acceptance tests before any code exists. The builder never sees them. Shadow scores quantify blind spots the builder didn't know about.

03

Six specialist agents

Each phase has its own expert. Product Manager, Architect, QA Sealed, Lead Engineer, QA Validator, and Outcome Evaluator โ€” stateless, focused, governed.

04

Checkpoint-gated

You stay in control. Human approval gates at every phase boundary. Review the PRD, approve the architecture, inspect the build โ€” or go fully dark with skip-all.

05

Crash-recoverable

Every phase checkpoints to state.json. Network drops, timeouts, interrupted sessions โ€” just run dark factory resume and continue.

06

Express mode

Short goals get fast builds. Express mode skips PRD and architecture, but still runs sealed QA from the raw goal. Quick fixes get the same quality envelope.

The assembly line.

Seven phases, each with its own specialist agent. Work flows forward through checkpoints. Express mode condenses to three phases.

Phase 0: SetupPhase 1: PRDPhase 2a: QA Sealed ๐Ÿ”’Phase 2b: ArchitecturePhase 3: BuildPhase 4: ValidationPhase 5: HardeningPhase 6: DeliveryPhase 7: OutcomePhase 0: SetupPhase 1: PRDPhase 2a: QA Sealed ๐Ÿ”’Phase 2b: ArchitecturePhase 3: BuildPhase 4: ValidationPhase 5: HardeningPhase 6: DeliveryPhase 7: Outcome
Phase 0

Factory Setup

Creates an isolated git worktree and branch. No interference with your working tree.

Factory Manager
Phase 1

Product Spec

Product Manager writes PRD.md with acceptance criteria, user stories, and scope.

Product Manager
Phase 2a

QA Sealed ๐Ÿ”’

Hidden acceptance tests written from the PRD. SHA-256 hashed. Builder never sees them.

QA Sealed
Phase 2b

Architecture

System design โ€” diagrams, contracts, tech decisions. Runs in parallel with QA Sealed.

Architect
Phase 3

Build + Tests

Lead Engineer implements the spec and writes their own tests. Can't see the sealed suite.

Lead Engineer
Phase 4

Sealed Validation

Sealed tests injected temporarily. Shadow score reveals the gap. Auto-hardening if needed.

QA Validator
Phase 5

Hardening

Automatic fix cycles when shadow score > 0%. Builder sees failures but never sealed tests.

Lead Engineer
Phase 6

Delivery

Final human checkpoint. Delivery report with shadow score. Approve, modify, or reject.

Factory Manager
Phase 7

Outcome Eval

Optional post-ship assessment. PRD criteria fulfillment and KPI scoring out of 100.

Outcome Evaluator
sample session โ€” full mode
youdark factory โ€” build a REST API rate limiter middleware
factory๐Ÿญ Run: run-20260325-0900 | Mode: FULL
Worktree created at .factory/runs/run-20260325-0900
pmPRD.md ready. 8 acceptance criteria, 3 user stories, token bucket algorithm specified.
factory๐Ÿ“‹ Checkpoint 1 โ€” approve spec? [approve] [modify] [abort]
qa ๐Ÿ”’Sealed tests authored. SHA-256: a7f3... โ€” stored in .factory/sealed/
archARCH.md ready. Express middleware, Redis backing store, sliding window.
engBuild complete. src/middleware.ts, src/store.ts, 14 unit tests passing.
factory๐Ÿญ SHADOW_SCORE: 11.1% โ€” 2 sealed failures. Hardening cycle 1/3...
engFixed: edge cases for burst reset and concurrent requests.
factory๐Ÿญ SHADOW_SCORE: 0% โ€” all sealed tests pass โœ“
Delivery checkpoint ready.

The team.

Six specialist agents, each with its own prompt, model assignment, and governance rules. Stateless โ€” they only see what the Factory Manager passes them.

๐Ÿ“‹

Product Manager

Phase 1 ยท PRD Author

Turns your free-text goal into a structured PRD with acceptance criteria, user stories, and technical constraints. Capped at 180 lines.

model: claude-sonnet-4.6
๐Ÿ—๏ธ

Architect

Phase 2b ยท System Designer

Designs the system from the PRD โ€” diagrams, contracts, file structure, tech decisions. Runs in parallel with QA Sealed.

model: claude-sonnet-4.6
๐Ÿ”’

QA Sealed

Phase 2a ยท Hidden Test Author

Writes acceptance tests from the PRD that the builder will never see. Tests are SHA-256 hashed and stored in a sealed vault.

model: claude-sonnet-4.6
โš™๏ธ

Lead Engineer

Phase 3 & 5 ยท Builder

Implements the spec, writes tests, and handles hardening cycles. Only sees failure messages from sealed tests โ€” never the tests themselves.

model: claude-sonnet-4.6
๐Ÿงช

QA Validator

Phase 4 ยท Sealed Test Runner

Temporarily injects sealed tests into the worktree, runs them, reports the shadow score, then removes all traces.

model: claude-haiku-4.5
๐Ÿ“Š

Outcome Evaluator

Phase 7 ยท Post-Ship Analyst

Revisits archived artifacts after delivery. Scores PRD criteria fulfillment and KPI alignment out of 100.

model: claude-sonnet-4.6

Sealed-envelope
testing.

QA writes tests before code exists and hides them from the builder. The quality gap between what the builder tests and what the sealed suite catches is your shadow score โ€” a blind-spot metric you can't game.

How It Works

QA Sealed writes tests using only the PRD โ€” before any code exists. Tests are SHA-256 hashed and stored in .factory/sealed/.

During Phase 4, sealed tests are temporarily injected, executed, and immediately removed. The builder only ever sees failure messages โ€” never the test source.

Shadow score = sealed failures รท sealed total. 0% means the builder nailed it blind. >25% signals spec/test misalignment.

Shadow Score Spec โ†—

Why It Matters

Prevents overfitting. Builders can't "teach to the test" because they never see the sealed suite.

Quantifies quality. Shadow scores expose blind spots numerically โ€” not subjectively.

Automates escalation. Hardening cycles fire automatically when sealed tests fail. Up to 3 cycles before human decision.

Retains speed. Express mode derives sealed tests from the raw goal text, so even quick fixes get quality coverage.

  • โ†’ Classic TDD: builder sees all tests
  • โ†’ Manual QA: slow, inconsistent
  • โ†’ Dark Factory: blind, quantified, fast

Architecture.

๐Ÿ’ฌYour Goalfree-text prompt๐Ÿญ dark factoryFACTORY MANAGER ยท CHECKPOINTS ยท STATEAGENT TEAM๐Ÿ“‹ PMPRD Author๐Ÿ—๏ธ ArchDesigner๐Ÿ”’ QASealed Testsโš™๏ธ EngBuilder๐Ÿงช ValidatorShadow Score๐Ÿ“Š OutcomePost-ShipGIT WORKTREE ยท .factory/sealed/ ยท state.json ยท SHADOW SCORE SPEC

Commands.

Six commands. Everything else is handled by the pipeline automatically.

Full Build dark factory โ€” <goal>

Complete pipeline with all 7 phases and checkpoints at every gate. The "lights out" experience.

Express dark factory express โ€” <goal>

Skips PRD/Architecture. Sealed QA still runs from the raw goal. One checkpoint at delivery.

Resume dark factory resume

Reloads state.json and continues from the saved phase. Crash recovery built in.

Status dark factory status

Prints current state without mutating anything. Shows pending evaluations.

Evaluate dark factory evaluate <run-id>

Launches Phase 7 Outcome Evaluator for a delivered run. KPI scoring out of 100.

Premium dark factory premium โ€” <goal>

Routes all agents through claude-opus-4.6 for one run.

Step 1Add the skillOne command in Copilot CLI
Step 2State your goalPlain English, any scope
Step 3Review the PRProduction-grade, tested

Build.

# add the skill to Copilot CLI
โ€บ /skills add DUBSOpenHub/dark-factory
ย 
# start building
โ€บ dark factory โ€” build a dependency audit CLI
01

Add Skill

/skills add DUBSOpenHub/dark-factory
02

Describe

dark factory โ€” <your goal>
03

Ship

approve at delivery checkpoint