Early Access — Pre-Launch

Stop babysitting
AI agents.

The first dark factory that works. MergeFoundry interviews requirements, plans dependency-gated task graphs, dispatches parallel agents in isolated worktrees, cross-model reviews every diff, certifies with 6 drift auditors, and merges through an AI-powered queue. You wake up to certified, production-ready PRs.

Request Early Access Watch it run ↓

From the creator of Flow-NextFirst production deployment live7 backend adaptersOpen-core · MIT licensed

MergeFoundry — Pipeline · Live

FOUNDRY FLOOR · LIVE

RUN mf-20260407TASKS 3QUEUE 0

SPEC

interview

PLAN

task graph

IMPLEMENT

worktree

REVIEW

cross-model

MERGE

queue

fn-1SHIP ✓

fn-2SHIP ✓

fn-3REWORK ↻

THROUGHPUT14.2/hr

illustrative

active

ship

rework

The Problem

AI tools made coding 10x faster.
Everything else became the bottleneck.

DORA 2025 · Source

more time on review

larger PRs

AI coding tools made implementation 10x faster. Review didn't scale. The bottleneck moved. MergeFoundry moves it back.

Requirements Bottleneck

Developers ship faster than PMs can specify. Vague specs, manual decomposition, long clarification loops. The left side of the SDLC can't keep up.

→

Spec augmentation, interview/clarification, acceptance criteria generation, structured writeback into Jira.

Review Bottleneck

100x more code means 100x more to review. One model judging its own output is a monoculture failure. Humans can't scale review proportionally.

→

Cross-model review. Different model reviews than implements. Evidence-linked summaries with risk-tiered attention routing.

No Runtime Proof

"The code looks plausible" is not evidence. No browser validation. No screenshots. No DOM snapshots. Ship and pray.

→

First-class runtime validation. Browser testing via agent-browser. Screenshots, DOM, console, network traces as evidence artifacts.

Drift / Silent Decay

AI output looks done but quietly drifts from spec. Breaks contracts. Skips docs. Leaves tests stale. "Done" doesn't mean "correct."

→

Six drift auditors. Remediation loops. Blocker synthesis. Certify only when clean.

No Verifiability

Agents lie about completion. No certification artifacts. No evidence chain. No way to prove what was done, by which model, whether it actually passed.

→

Certification artifacts with model attribution, remediation count, runtime validation summary, integrity checksums.

No Harness Fits All

Claude reasons well. Codex parallelizes well. Cursor integrates well. Forcing one tool on every stage leaves quality on the table.

→

7 backend adapters. Backend selected per-task, per-stage. The right model for the right job, simultaneously.

Architecture

Five stages. Every change.

📋

PLAN

Interview workers clarify requirements. 6 parallel scouts explore the codebase. Gap analysis catches cross-cutting risks. Synthesis generates dependency-gated task graphs optimized for parallel execution.

Structured specs, not vague prompts. Tasks decomposed for maximum safe parallelism.

⚡

IMPLEMENT

Workers code in isolated git worktrees — one per task, parallel across the dependency graph. Any backend: Claude, Codex, Gemini, Cursor, or your own.

No file conflicts. Full codebase context. Backend-agnostic execution.

🔍

REVIEW

A different model reviews the diff against the original spec. SHIP or NEEDS_WORK with focus-point ranking, risk-tiered policies, and cited evidence. Feedback loops until SHIP.

Cross-model adversarial review. Not self-grading. Cited analysis, not "LGTM."

🔄

SYNC

When upstream changes affect interfaces, downstream task specs are automatically updated. Keeps dependent tasks accurate as the codebase evolves during the run.

No stale specs. No drift. No surprise integration failures.

✓

MERGE

FIFO queue with auto-rebase, AI conflict resolution, and push-before-done semantics. 6 drift auditors certify the epic. Evidence chain with model attribution.

Clean history. Conflict-free integration. Certified output with audit trail.

→

Review isn’t optional. It’s structural. Every change passes through adversarial review before it touches your codebase. A different model reviews than implements — cross-model verification catches what self-review misses.

The foundry floor.

Four stages. Tasks move in parallel. Cross-model review is structural, not optional. Watch the machine run.

FOUNDRY FLOOR · LIVE

RUN mf-20260407TASKS 5QUEUE 0

IMPLEMENT

worktree

REVIEW

cross-model

SYNC

plan-sync

MERGE

queue

fn-1SHIP ✓

fn-2SHIP ✓

fn-3REWORK ↻

fn-4SHIP ✓

fn-5SHIP ✓

THROUGHPUT14.2/hr

MEAN CYCLE4m 18s

PARALLEL5 workers

illustrative

active

ship

rework

Specs in. Merged PRs out. Everything between is orchestrated.

mf — MergeFoundry Pipeline

PLAN

Plan & dispatch

Epic parsed into dependency-gated task graph

$ mf run --epic feat-auth

┌─ Epic: feat-auth

├─ Tasks: 6 (3 parallel, 3 dependent)

├─ Backend: composer-2-fast → implement

├─ Review: gpt-5.4-high (codex) → review

└─ Graph:

T1 ──┬── T3 ── T5

T2 ──┘ T4 ── T6

✓ Dependency graph validated

✓ Worktrees ready

→ Dispatching parallel group [T1, T2]...

Task dependency graph with parallel execution groups

Total time14 minutes

Tasks6 merged

Models2 backends

Human input0

Cost$0.47

Five stages. Zero human intervention. See the pipeline run.

Request Early Access

PM & PO Tooling

Not just for developers.
The left side of the SDLC, accelerated.

Developers ship faster than PMs can specify. MergeFoundry closes the gap. PMs and POs write requirements in natural language — MergeFoundry interviews, structures, and produces execution-grade specs that flow directly into the pipeline. For frontend-heavy streams, a Stitch-style DESIGN.md can ride alongside the spec as an agent-readable design contract. Full integration with the tools your team already uses.

Integrations

Jira

Import epics, sync status, write back artifacts

GitHub

Issues, PRs, branch management, handoff publishing

Linear

Issue import, status sync, bidirectional updates

Import from where you plan. Write back to where you collaborate. MergeFoundry normalizes work internally and keeps external systems in sync.

Spec Augmentation

Rough requirements in natural language go in. MergeFoundry interviews, clarifies edge cases, surfaces missing acceptance criteria, and produces execution-grade specs.

Interview Workers

Interactive questionnaires that ask the right questions: What should happen on error? What contracts can't break? What are the edge cases? Not prompt engineering — structured requirement elicitation.

Acceptance Criteria Generation

Every task gets testable acceptance criteria derived from the interview. Reviewers check against these criteria. Auditors verify compliance. The spec is the contract.

Design Contracts

For UI-heavy work, MergeFoundry can also consume a Stitch-style `DESIGN.md` so layout, component, spacing, and visual rules become part of the planning contract instead of getting lost in prose.

Structured Writeback

Results flow back to where your team already works — Jira tickets updated, GitHub PRs published with evidence packages, Linear issues synced. No copy-paste. No context switching.

Task Detail — Spec & Acceptance Criteria

Task detail with structured spec and acceptance criteria

Spec-Driven Planning

Requirements in.
Executable task graphs out.

PMs can’t specify fast enough. MergeFoundry closes the gap. Interview workers clarify requirements, 6 parallel scouts explore the codebase, gap analysis catches what individual scouts miss, and synthesis generates dependency-gated task graphs with acceptance criteria. A Stitch-style DESIGN.md can also feed planning when frontend/design rules matter. Cross-model plan review validates before a single line is written.

Parallel Scouts

📂

Repo Context

Codebase structure, patterns, conventions

⚙️

Practice

Implementation patterns, pitfalls, best practices

📄

Documentation

Existing docs, API surfaces, architecture

🔗

Epic Deps

Cross-epic dependencies, execution order

🔍

Docs Gap

Missing documentation, stale references

🧠

Memory

Prior decisions, learned patterns, team context

$ mf plan interview fn-1-auth-overhaul
$ mf plan generate fn-1-auth-overhaul
$ mf plan review fn-1-auth-overhaul

Interview

Interactive questionnaire clarifies requirements, surfaces edge cases, generates acceptance criteria. PMs specify in natural language — MergeFoundry structures it into execution-grade specs.

Not prompt engineering. Structured interview that asks the right questions: What should happen on error? What are the edge cases? What contracts can't break? Output: normalized spec with acceptance criteria, dependency mapping, and file targets.

Scout

Planning scouts explore the codebase — repo context, implementation patterns, documentation, epic dependencies, doc gaps, team memory, and optional design-system context when `DESIGN.md` is present. Each produces findings with evidence and risk assessments.

Scouts run on fast-tier models for speed. Each scout has a strict output contract: summary, key findings, evidence (file refs), risks/unknowns, recommendations, and handoff notes for synthesis. Design-system scouting turns tokens, component rules, and layout constraints into planning input instead of leaving them as vague visual intent.

Gap Analysis

Cross-scout synthesis identifies critical gaps, conflicting findings, and risk hotspots. Generates a validation checklist for the plan.

Not just concatenation. The gap analyzer cross-references scout outputs, identifies contradictions, flags risks that no individual scout caught, and produces a focused brief for the synthesis phase.

Plan Synthesis

Generates dependency-gated task graph from scout findings + gap analysis. Each task gets a structured spec with acceptance criteria, file targets, and dependency links.

Output: executable task graph with parallel groups, critical path identification, and per-task specs. Tasks reference specific files and acceptance criteria — not vague descriptions.

Plan Review

Cross-model review of the generated plan. A different model (default: Opus) reviews the plan for completeness, risk, and architectural soundness. SHIP or NEEDS_WORK with cited feedback.

Same adversarial review philosophy as code review, applied to planning. The reviewer checks: Are edge cases covered? Are dependencies correct? Is the task decomposition right-sized? Are acceptance criteria testable?

Plan — Interview · Generate · Review

Planning interface with interview, plan generation, and visual flowchart

Review Intelligence

Not “LGTM.”
Cited analysis with evidence.

DORA 2025 found teams spend 91% more time on review. MergeFoundry compresses that. Every review produces a structured artifact with focus-point ranking, risk-tiered policies, structural analysis, and decisions with specific file and line citations. The reviewer doesn’t approve — it produces evidence.

Diff + Spec

→Review Worker

→SHIP / NEEDS_WORK

⟳Feedback Loop

Review Artifact Sections

What happened in this run

AI-generated narrative of changes, intent, and impact

Structural implications

How file changes affect architecture and layering

Review paths that dominate risk

Which changes carry the most risk and why

Observed layering from paths

File tree analysis showing architectural layers touched

Signal image from telemetry

Patterns detected from worker activity and tool usage

Decisions implied by context

Architectural decisions with specific file:line citations

Focus-Point Ranking

Every review produces ranked focus points — not a wall of comments. Each point has a priority (critical/high/medium/low), a numeric score, cited evidence with file references, and specific reasons. Reviewers orient on what matters, not what changed.

Risk-Tiered Policies

File paths map to risk tiers. Changes to src/core/**, auth/**, migrations/**, and *.sql get high-risk treatment — re-run verification gates even after initial pass. Docs and tests get low-risk fast-path. Configurable per-repo.

Review Compression

AI synthesis compresses raw diffs into structured artifacts: what happened, structural implications, layering observations, risk-dominant paths, and decisions with cited file:line references. Not "LGTM" — cited analysis with evidence chain.

Cross-Model Adversarial Review

The review model is always different from the implementation model. Claude implements, Gemini reviews. Different training data, different blind spots. Anti-monoculture at the architectural level.

Review: what happened + structural implications

Review: layering observations + risk-dominant paths

Epic Certification

“Done” doesn’t mean correct.
Certification does.

After all tasks merge, six drift auditors verify nothing was silently broken. Each auditor produces a structured verdict with drift score, cited evidence, and remediation instructions. Blocking findings trigger automatic remediation loops. Certification artifacts provide a durable evidence chain: what was done, by which model, and whether it actually passed.

Contract Driftblocking

Detects API/public contract breakage. Additive exports are advisory; breaking changes block.

CHECKS: Public API signatures, export changes, breaking interface modifications

Spec Driftblocking

Compares implementation against epic and task intent. Catches silent divergence from requirements.

CHECKS: Acceptance criteria compliance, feature completeness, intent alignment

Doc Driftadvisory

Verifies documentation matches shipped behavior. Catches stale READMEs, outdated setup guides.

CHECKS: README accuracy, API docs, setup instructions, architecture docs

Changelog Driftadvisory

Ensures release notes exist for shipped changes. Non-blocking when repo has no changelog convention.

CHECKS: CHANGELOG entries, release notes, version bumps

ADR Driftadvisory

Checks that architectural decisions are documented. Undocumented design choices get flagged.

CHECKS: Architecture Decision Records, design rationale documentation

Test Driftblocking

Verifies test coverage for changed behavior. Missing tests for new code paths block by default.

CHECKS: Test existence for changes, coverage gaps, regression test adequacy

Remediation Loop

Not just flag-and-forget. Blocking findings trigger automatic fix → re-gate → re-audit cycles.

Detect

Auditors fan out in parallel, each producing structured verdicts with drift scores (0-100) and cited evidence

Synthesize

Blocker synthesis deduplicates findings, classifies advisory vs blocking, and produces a prioritized remediation list

Remediate

Auto-spawns remediation workers using the impl backend. Workers fix blocking findings and commit changes

Gate

Repo-native gates re-run (lint, typecheck, test). Auditors re-run on the remediated code. Loop until clean or max attempts

Certify

Certification artifact emitted with aggregate drift score, model attribution per stage, remediation count, and integrity checksum

Certification Artifact

{
  "epic_id": "fn-1-auth-overhaul",
  "certified_at": "2026-04-04T14:32:00Z",
  "aggregate_drift_score": 4,
  "auditor_count": 6,
  "remediation_count": 1,
  "model_attribution": [
    { "stage": "impl", "backend": "composer-2-fast" },
    { "stage": "review", "backend": "gpt-5.4-high" },
    { "stage": "auditor", "backend": "opus-4.6" }
  ],
  "checksum": "sha256:a3f8..."
}

Runtime Validation

“Tests pass” is not proof.
Runtime evidence is.

MergeFoundry launches your app, navigates it with agent-browser, captures screenshots, DOM snapshots, console logs, and network traces as first-class evidence artifacts. Not simulated — actual browser execution with governed templates. When a repo carries a DESIGN.md contract, runtime validation can also use it for design-system conformance checks.

Stage Policies

ImplementationautoRuns when browser/CLI target configured

ReviewoptionalAvailable for manual/operator use

CertificationautoEvidence feeds into auditor verdicts

HandoffrequiredMust run; missing targets produce failed receipt

Evidence Artifacts

Screenshots

Full-page and viewport captures at specific routes

DOM Snapshots

Complete DOM state for structural verification

Console Logs

Errors, warnings, and info messages during execution

Network Traces

Failed requests, slow responses, unexpected calls

Accessibility Trees

ARIA compliance and semantic structure

Video Recordings

Full browser session recordings for review

Performance Traces

Chrome DevTools profiling data

CLI Transcripts

Full command-line session output

Browser Templates (agent-browser)

smoke

Deterministic smoke check — does the page load, hydrate, render without errors?

template

capture

Evidence capture — screenshots, DOM snapshots, console logs at specific routes

agentic

dogfood

Bounded exploratory review — AI navigates the app, finds bugs, reports findings

agentic

dogfood-video

Exploratory review with video recording — visual regression evidence

template

snapshot-diff

Before/after comparison — diff screenshots and accessibility trees

template

trace-profile

Performance profiling — Chrome DevTools traces as evidence artifacts

CLI/TUI Templates

smoke

cli-smoke

CLI command execution and exit code verification

template

cli-transcript

Full session transcript capture with structured output

agentic

cli-review

AI-driven CLI exploration with finding classification

Runtime validation with browser verification results

Merge Queue

Isolated worktrees.
Conflict-free integration.

During planning, task decomposition optimizes for maximum safe parallelism — structuring work so independent tasks execute simultaneously. Each worker gets its own isolated git worktree. The merge queue handles integration: FIFO ordering, auto-rebase, AI conflict resolution with deterministic guardrails, and push-before-done semantics. Every merge is verified — conflict markers cleared, remote HEAD changed, Flow state finalized.

Conflict Policies

autonomousDEFAULT

AI resolver attempts resolution. Unresolved conflicts block as conflict status. No human escalation.

human_assist

AI resolver attempts first. Unresolved conflicts park as awaiting_human for operator intervention.

fail_fast

Unresolved conflicts immediately stop the run. Maximum safety, minimum autonomy.

Deterministic Guardrails

Binary conflicts block directly — no AI resolver on binaries

Resolver must emit structured verdict receipt with RESOLVED or CANNOT_RESOLVE

Queue verifies all conflict markers cleared before accepting resolution

Queue verifies push evidence — remote HEAD must change

Merge and rebase operations run in isolated integration worktree, not primary checkout

Exponential backoff with jitter on transient failures: min(5s × 2^n, 300s) × [0.5, 1.5]

Max retry limit per entry — removed from queue after consecutive errors

Merge lock serializes rebase operations to prevent concurrent interference

Merge Status States

pendingNot yet in queue

queuedWaiting for merge slot

rebasingAuto-rebase in progress

resolvingAI conflict resolver running

mergingMerge commit being created

finalizingFlow state update + push

mergedSuccessfully merged and finalized

conflictTrue conflict — needs resolution

awaiting_humanParked for human intervention

Push-Before-Done Ordering

1. Merge — task branch → run branch
2. Push — merge commit is source of truth
3. Flow completion — mark task done in spec
4. Finalization commit — commit spec update
5. Push finalization — push final state

ISOLATED INTEGRATION WORKTREE

All merge, rebase, and conflict resolution operations run in a dedicated integration worktree (.worktrees/mf-integration-*), not your primary checkout. Your working tree stays clean for edits. No dirty-state interference. Deterministic conflict context.

Every change planned, reviewed, certified, and merged. Autonomously.

Request Early Access

Model & Harness Agnostic

Use every model where it’s strongest.

Every backend serves the same purpose — headless control of a coding agent CLI. Different organizations have different subscriptions, different model access, different cost profiles. MergeFoundry uses them all. Assign any backend to any pipeline stage. Optimize for cost, capability, or model availability.

mf.config.yaml

backends:
  implement: composer-2-fast
  review: gpt-5.4-high (via codex)
  sync: opus-4.6 (via claude)
  resolve: opus-4.6 (via claude)

One harness implements. A different one reviews. Mix and match based on what your team has access to.

Claude Code

Anthropic models via Claude Code CLI harness

implreviewsyncmerge

OpenAI Codex

OpenAI models via Codex CLI harness

implreviewsyncmerge

Cursor

Any Cursor-available model via Cursor CLI harness

implreviewsyncmerge

Gemini

Google models via Gemini CLI harness

implreviewsyncmerge

GitHub Copilot

GitHub-hosted models via Copilot CLI harness

implreviewsyncmerge

Pico models via Pi CLI harness

implreviewsyncmerge

Custom

Any agent CLI via the adapter interface

implreviewsyncmerge

Settings — Backend & Harness Configuration

Backend configuration showing different models assigned to different pipeline stages via harness adapters

Capabilities

Coding is the easy part.
MergeFoundry handles the rest.

Operator Cockpit

Mission Control

What's active, what's blocked, what needs attention. System state cards, attention queue, worker rail, concurrent work surface. Attention routing, not babysitting.

Analytics

Code Factory Observatory

Usage, autonomy, delivery, efficiency metrics. Runs, costs, tokens, model distribution. DORA-adjacent — no theatre.

6 Drift Auditors

Epic Certification

Contract-drift, spec-drift, doc-drift, changelog-drift, ADR-drift, test-drift. Remediation loops until clean. Certification artifacts with evidence chain.

Browser-Based Evidence

Runtime Validation

Screenshots, DOM snapshots, console logs, network traces. Evidence flows into review, certification, and handoff. Not just "tests pass" — runtime proof.

AI review with cited decisions and file references

Cited Analysis

AI Review Intelligence

The reviewer doesn't just approve — it produces structural analysis, layering observations, risk-dominant paths, and decisions with specific file and line citations.

Planning with interview, generation, and review

Interviews + Scouts + Plan Review

Spec-Driven Planning

Interview workers clarify requirements. 6 parallel scouts explore the codebase. Gap analysis synthesizes cross-cutting risks. Plan review validates before a single line is written.

Agent Readiness

8 pillars. 48 criteria.
Know exactly where you stand.

Before your first run, MergeFoundry assesses your codebase across 8 pillars with 48 individual criteria. 8 parallel scouts — one per pillar — analyze your repo and produce a maturity score from L1 (Minimal) to L5 (Autonomous). Agent readiness scouts can auto-fix issues. Production readiness scouts report only.

Agent Readiness — 5 pillars, 30 criteria (auto-fixable)

Style & Validation6 criteria

Linter, formatter, type checking, pre-commit hooks

tooling-scout

Build System6 criteria

Build tool, dev command, lock file, monorepo tooling

build-scout

Testing6 criteria

Test framework, runnable tests, coverage config, E2E

testing-scout

Documentation6 criteria

README, CLAUDE.md/AGENTS.md, setup docs, architecture

docs-scout

Dev Environment6 criteria

.env.example, runtime pinning, Docker/devcontainer, IDE config

env-scout

Production Readiness — 3 pillars, 18 criteria (report only)

Observability6 criteria

Logging, tracing, metrics, health checks

Security6 criteria

Branch protection, CODEOWNERS, secrets scanning

Workflow & Process6 criteria

CI/CD, PR templates, issue templates, automation

Maturity Levels

Autonomous

≥85% agent, all pillars ≥80%

TARGET

Optimized

≥70% agent, all pillars ≥60%

Standardized

≥50% agent, all pillars ≥40%

Functional

≥30% agent score

Minimal

Below Functional

$ mf readiness

 Agent Readiness: 78% (L4 Optimized)
 Prod Readiness: 62%
 Overall: 73%

 Pillar Scores:
 ✓ Style & Validation ····· 100%
 ✓ Build System ··········· 83%
 ! Testing ················ 50%
 ✓ Documentation ·········· 83%
 ✓ Dev Environment ········ 67%

Readiness radar chart with pillar scores and history

SCORING: overall = agent_avg × 0.7 + prod_avg × 0.3

Code Factory Observatory

Measure the factory.
No DORA theatre.

Four dimensions of operational truth: usage, autonomy, delivery, efficiency. Every metric is classified as stable (backed by persisted state, safe for export) or advisory (heuristic, must stay labeled). No deployment frequency. No change failure rate. No individual productivity ranking. Factory metrics for factory operators.

Analytics — Code Factory Observatory

Usage

Total runs

Input tokens

Output tokens

Total cost (USD)

Model distribution

Autonomy

Human intervention rate

Conflict rate

Review retry rate (NEEDS_WORK %)

Auditor remediation rate

Delivery

Tasks merged

Epics certified

Handoffs published

Runtime validation pass rate

Efficiency

Cost per merged task

Time to merge

Time to certification

Anti-misuse by design

MergeFoundry explicitly excludes DORA metrics (deployment frequency, change failure rate, MTTR), business-value attribution, and individual engineer productivity ranking. Advisory metrics are labeled as such and cannot be mistaken for ground truth. The observatory measures the factory, not the people.

Governed Learning

The factory gets smarter.
Under your control.

After each epic, MergeFoundry extracts skill candidates from review focus points, file patterns, and worker telemetry. But skills don’t auto-apply. Every candidate goes through explicit operator review and promotion. No uncontrolled memory drift. No silent behavioral changes. Governed learning with provenance.

Review Checklistreview_checklist

Distilled from the top focus points of implementation reviews. Captures what the reviewer flagged most — security patterns, edge cases, API contract checks — as reusable checklists for future workers.

SOURCE: Top 4 focus points from review artifacts

File Clusterfile_cluster

Identifies commonly co-modified file groups from review evidence. When one file in the cluster changes, workers get guidance about the others. Prevents "forgot to update the tests" drift.

SOURCE: Most common 2-part file prefix from review refs

Role/Backend Patternrole_backend_pattern

Captures which model/role combinations produced the best outcomes for specific task types. Encodes institutional knowledge about backend routing.

SOURCE: Worker performance telemetry + review scores

Skill Lifecycle

Extracted

Auto-extracted from review artifacts, certifications, and worker logs after each epic

In Review

Operator reviews the skill draft — title, summary, trigger hint, and markdown content

Promoted

Written as SKILL.md to the repo. Injected into matching worker prompts on future runs

Rejected

Operator decides the pattern isn't worth encoding. Archived with reason

CLI

$ mf skills list
$ mf skills show sk-a3f8
$ mf skills promote sk-a3f8 --path skills/auth-review.md
$ mf skills reject sk-b2c4 --reason "too specific"

PROVENANCE TRACKING

Every skill candidate carries full provenance: source epic, source run, review revision key, certification checksum, event count, and evidence anchors. You know exactly where each skill came from and can trace it back to the original review or certification that produced it.

Autonomy Ladder

From assistant to code factory.

Meet your team where they are. Scale autonomy as trust grows.

Legacy

No AI

Assistant

Copilot, inline suggestions

Enabled

Multi-file edits, agent-assisted

NativeMergeFoundry core

Agentic SDLC, review-gated automation

Code FactoryMergeFoundry + missions

Mission Control, attention routing, measurable delivery

Dark FactoryMergeFoundry endgame

Fully autonomous, minimal human in loop

Most teams are stuck at L1–L2 — autocomplete and agent-assisted edits.

MergeFoundry puts you at L3–L4 on day one.

Review-gated automation, Mission Control, measurable delivery — immediately. And MergeFoundry is the first system designed to reach L5: lights-out dark factory.

Early Access

Request early access

Be first in line when MergeFoundry opens.

Created by @gmickel · Creator of Flow-Next

GitHub Discord X

Stop babysittingAI agents.AI agents.

AI tools made coding 10x faster.Everything else became the bottleneck.

Requirements Bottleneck

Review Bottleneck

No Runtime Proof

Drift / Silent Decay

No Verifiability

No Harness Fits All

Five stages. Every change.

The foundry floor.

Specs in. Merged PRs out. Everything between is orchestrated.

Plan & dispatch

Not just for developers.The left side of the SDLC, accelerated.

Spec Augmentation

Interview Workers

Acceptance Criteria Generation

Design Contracts

Structured Writeback

Requirements in.Executable task graphs out.

Interview

Scout

Gap Analysis

Plan Synthesis

Plan Review

Not “LGTM.”Cited analysis with evidence.

Focus-Point Ranking

Risk-Tiered Policies

Review Compression

Cross-Model Adversarial Review

“Done” doesn’t mean correct.Certification does.

Remediation Loop

“Tests pass” is not proof.Runtime evidence is.

Isolated worktrees.Conflict-free integration.

Use every model where it’s strongest.

Coding is the easy part.MergeFoundry handles the rest.

Mission Control

Code Factory Observatory

Epic Certification

Runtime Validation

AI Review Intelligence

Spec-Driven Planning

8 pillars. 48 criteria.Know exactly where you stand.

Measure the factory.No DORA theatre.

The factory gets smarter.Under your control.

From assistant to code factory.

Request early access

Stop babysitting
AI agents.

AI tools made coding 10x faster.
Everything else became the bottleneck.

Not just for developers.
The left side of the SDLC, accelerated.

Requirements in.
Executable task graphs out.

Not “LGTM.”
Cited analysis with evidence.

“Done” doesn’t mean correct.
Certification does.

“Tests pass” is not proof.
Runtime evidence is.

Isolated worktrees.
Conflict-free integration.

Coding is the easy part.
MergeFoundry handles the rest.

8 pillars. 48 criteria.
Know exactly where you stand.

Measure the factory.
No DORA theatre.

The factory gets smarter.
Under your control.