Agentic Build System · Codename: Lights Out

dark
factory.

AGENTIC · LIGHTS OUT · SEALED-ENVELOPE TESTING

The agentic dark factory for AI building. Turn a short free-text goal into a production-grade pull request — six specialist agents orchestrated through a checkpoint-gated pipeline with sealed-envelope testing. Builders never see the hidden tests. The shadow score reveals the truth.

Goal →📋 PRD→ 🔒 Sealed QA→ 🏗️ Arch→ ⚙️ Build→ 🧪 Validate→ ✅ Ship

View on GitHub →

☝️ One command to install · Paste that into the Copilot CLI and start building

🐙 Created with 💜 by Gregg Cochran @DUBSOpenHub with the GitHub Copilot CLI

dark factory — copilot cli

youdark factory — build a CLI that audits deps for GPL

factory🏭 Run: run-20260401 | Mode: FULL

pmPRD.md written — 12 acceptance criteria

qa 🔒Sealed tests hashed. Builders won't see these.

engBuild complete — src/ tests/ README

factory🏭 SHADOW_SCORE: 0% — all sealed tests pass ✓

factoryDelivery checkpoint — approve?

›

The Machine

6 agents, 7 phases, zero blind spots

Dark Factory orchestrates a team of specialist agents through a sealed-envelope pipeline. Every build is measured. Every gap is quantified by the shadow score.

Agents

Specialist AI roles

Phases

Checkpoint-gated pipeline

Hardening Cycles

Automatic fix loops

Target Shadow Score

Perfect sealed coverage

🔒 What is Shadow Score?

Shadow Score = sealed test failures ÷ total sealed tests. It measures how much the builder missed when they couldn't see the hidden acceptance suite.

0% — perfect blind coverage≤10% — team target>25% — spec/test misalignment

About

Agentic
lights out builds

Dark Factory is an agentic build system — it isolates every build in a disposable git worktree, orchestrates six specialist AI agents, and measures quality with sealed-envelope testing. The shadow score tells you exactly how much the builder missed. Lights out means the builder works blind — and the hidden tests prove whether the spec was truly covered.

🔒

Sealed Tests

🌿

Git Worktree

📋

Checkpoints

⚡

Express Mode

config.yml
# what powers the factory
factory:
  default_mode: full
  max_hardening_cycles: 3
  agent_timeout_sec: 300

models:
  product_mgr: claude-sonnet-4.6
  architect: claude-sonnet-4.6
  qa_sealed: claude-sonnet-4.6
  lead_eng: claude-sonnet-4.6
  qa_validator: claude-haiku-4.5
  premium_model: claude-opus-4.6

checkpoints:
  allow_skip_all: true

Features

Agentic lights
out builds.

Goal in, PR out

Write a sentence. Get production code. Dark Factory takes a free-text goal and produces a complete pull request — spec, architecture, implementation, tests, and delivery report.

Sealed-envelope testing

QA writes hidden acceptance tests before any code exists. The builder never sees them. Shadow scores quantify blind spots the builder didn't know about.

Six specialist agents

Each phase has its own expert. Product Manager, Architect, QA Sealed, Lead Engineer, QA Validator, and Outcome Evaluator — stateless, focused, governed.

Checkpoint-gated

You stay in control. Human approval gates at every phase boundary. Review the PRD, approve the architecture, inspect the build — or go fully dark with skip-all.

Crash-recoverable

Every phase checkpoints to state.json. Network drops, timeouts, interrupted sessions — just run dark factory resume and continue.

Express mode

Short goals get fast builds. Express mode skips PRD and architecture, but still runs sealed QA from the raw goal. Quick fixes get the same quality envelope.

Pipeline

The assembly line.

Seven phases, each with its own specialist agent. Work flows forward through checkpoints. Express mode condenses to three phases.

Phase 0: SetupPhase 1: PRDPhase 2a: QA Sealed 🔒Phase 2b: ArchitecturePhase 3: BuildPhase 4: ValidationPhase 5: HardeningPhase 6: DeliveryPhase 7: OutcomePhase 0: SetupPhase 1: PRDPhase 2a: QA Sealed 🔒Phase 2b: ArchitecturePhase 3: BuildPhase 4: ValidationPhase 5: HardeningPhase 6: DeliveryPhase 7: Outcome

Phase 0

Factory Setup

Creates an isolated git worktree and branch. No interference with your working tree.

Factory Manager

Phase 1

Product Spec

Product Manager writes PRD.md with acceptance criteria, user stories, and scope.

Product Manager

Phase 2a

QA Sealed 🔒

Hidden acceptance tests written from the PRD. SHA-256 hashed. Builder never sees them.

QA Sealed

Phase 2b

Architecture

System design — diagrams, contracts, tech decisions. Runs in parallel with QA Sealed.

Architect

Phase 3

Build + Tests

Lead Engineer implements the spec and writes their own tests. Can't see the sealed suite.

Lead Engineer

Phase 4

Sealed Validation

Sealed tests injected temporarily. Shadow score reveals the gap. Auto-hardening if needed.

QA Validator

Phase 5

Hardening

Automatic fix cycles when shadow score > 0%. Builder sees failures but never sealed tests.

Lead Engineer

Phase 6

Delivery

Final human checkpoint. Delivery report with shadow score. Approve, modify, or reject.

Factory Manager

Phase 7

Outcome Eval

Optional post-ship assessment. PRD criteria fulfillment and KPI scoring out of 100.

Outcome Evaluator

sample session — full mode

youdark factory — build a REST API rate limiter middleware

factory🏭 Run: run-20260325-0900 | Mode: FULL
Worktree created at .factory/runs/run-20260325-0900

pmPRD.md ready. 8 acceptance criteria, 3 user stories, token bucket algorithm specified.

factory📋 Checkpoint 1 — approve spec? [approve] [modify] [abort]

qa 🔒Sealed tests authored. SHA-256: a7f3... — stored in .factory/sealed/

archARCH.md ready. Express middleware, Redis backing store, sliding window.

engBuild complete. src/middleware.ts, src/store.ts, 14 unit tests passing.

factory🏭 SHADOW_SCORE: 11.1% — 2 sealed failures. Hardening cycle 1/3...

engFixed: edge cases for burst reset and concurrent requests.

factory🏭 SHADOW_SCORE: 0% — all sealed tests pass ✓
Delivery checkpoint ready.

Agent Team

The team.

Six specialist agents, each with its own prompt, model assignment, and governance rules. Stateless — they only see what the Factory Manager passes them.

📋

Product Manager

Phase 1 · PRD Author

Turns your free-text goal into a structured PRD with acceptance criteria, user stories, and technical constraints. Capped at 180 lines.

model: claude-sonnet-4.6

🏗️

Architect

Phase 2b · System Designer

Designs the system from the PRD — diagrams, contracts, file structure, tech decisions. Runs in parallel with QA Sealed.

model: claude-sonnet-4.6

🔒

QA Sealed

Phase 2a · Hidden Test Author

Writes acceptance tests from the PRD that the builder will never see. Tests are SHA-256 hashed and stored in a sealed vault.

model: claude-sonnet-4.6

⚙️

Lead Engineer

Phase 3 & 5 · Builder

Implements the spec, writes tests, and handles hardening cycles. Only sees failure messages from sealed tests — never the tests themselves.

model: claude-sonnet-4.6

🧪

QA Validator

Phase 4 · Sealed Test Runner

Temporarily injects sealed tests into the worktree, runs them, reports the shadow score, then removes all traces.

model: claude-haiku-4.5

📊

Outcome Evaluator

Phase 7 · Post-Ship Analyst

Revisits archived artifacts after delivery. Scores PRD criteria fulfillment and KPI alignment out of 100.

model: claude-sonnet-4.6

Core Innovation

Sealed-envelope
testing.

QA writes tests before code exists and hides them from the builder. The quality gap between what the builder tests and what the sealed suite catches is your shadow score — a blind-spot metric you can't game.

How It Works

QA Sealed writes tests using only the PRD — before any code exists. Tests are SHA-256 hashed and stored in .factory/sealed/.

During Phase 4, sealed tests are temporarily injected, executed, and immediately removed. The builder only ever sees failure messages — never the test source.

Shadow score = sealed failures ÷ sealed total. 0% means the builder nailed it blind. >25% signals spec/test misalignment.

Shadow Score Spec ↗

Why It Matters

Prevents overfitting. Builders can't "teach to the test" because they never see the sealed suite.

Quantifies quality. Shadow scores expose blind spots numerically — not subjectively.

Automates escalation. Hardening cycles fire automatically when sealed tests fail. Up to 3 cycles before human decision.

Retains speed. Express mode derives sealed tests from the raw goal text, so even quick fixes get quality coverage.

→ Classic TDD: builder sees all tests
→ Manual QA: slow, inconsistent
→ Dark Factory: blind, quantified, fast

System Design

Architecture.

Reference

Commands.

Six commands. Everything else is handled by the pipeline automatically.

Full Build dark factory — <goal>

Complete pipeline with all 7 phases and checkpoints at every gate. The "lights out" experience.

Express dark factory express — <goal>

Skips PRD/Architecture. Sealed QA still runs from the raw goal. One checkpoint at delivery.

Resume dark factory resume

Reloads state.json and continues from the saved phase. Crash recovery built in.

Status dark factory status

Prints current state without mutating anything. Shows pending evaluations.

Evaluate dark factory evaluate <run-id>

Launches Phase 7 Outcome Evaluator for a delivered run. KPI scoring out of 100.

Premium dark factory premium — <goal>

Routes all agents through claude-opus-4.6 for one run.

Step 1Add the skillOne command in Copilot CLI

Step 2State your goalPlain English, any scope

Step 3Review the PRProduction-grade, tested

Build.

# add the skill to Copilot CLI

› /skills add DUBSOpenHub/dark-factory

# start building

› dark factory — build a dependency audit CLI

Add Skill

/skills add DUBSOpenHub/dark-factory

Describe

dark factory — <your goal>

Ship

approve at delivery checkpoint

darkfactory.

6 agents, 7 phases, zero blind spots

🔒 What is Shadow Score?

Agenticlights out builds

Agentic lightsout builds.

Goal in, PR out

Sealed-envelope testing

Six specialist agents

Checkpoint-gated

Crash-recoverable

Express mode

The assembly line.

Factory Setup

Product Spec

QA Sealed 🔒

Architecture

Build + Tests

Sealed Validation

Hardening

Delivery

Outcome Eval

The team.

Product Manager

Architect

QA Sealed

Lead Engineer

QA Validator

Outcome Evaluator

Sealed-envelopetesting.

How It Works

Why It Matters

Architecture.

Commands.

Build.

dark
factory.

Agentic
lights out builds

Agentic lights
out builds.

Sealed-envelope
testing.