Jaideep Singh

The unsolved infrastructure problem that sits underneath multi-agent orchestration. Companion to problem-space.md Phase 5 and strategic-insights.md Phase 3 (Capsules).

Last updated: 2026-04-13

## The Core Problem

Git worktrees solve code isolation but fail at runtime isolation. When multiple agents run tasks across branches simultaneously on one machine, they collide across every dimension except the source code itself.

This is the problem Atelier's "Capsules" concept targets — and the research shows it's deeper than the original PRD assumed.

## What Specifically Breaks

### 1. Port Conflicts

Every dev server defaults to the same ports: 3000 (React/Next), 5432 (Postgres), 6379 (Redis), 8080 (APIs). Launch two React apps from different worktrees → the second one fails.

Docker Compose can create separate networks, but both stacks declaring 3000:3000 cause immediate failure. No current tool auto-resolves this at the worktree level.

### 2. Shared Persistent Data

Database migrations from one branch silently affect another. A test in branch A updates the shared Postgres instance that branch B reads, producing false signals about branch B's correctness. This is invisible until something breaks in production.

### 3. Bind-Mount Collisions

Docker volumes mounted from the host mean "two file trees pointing at one machine state." Both branches write to .env files, config directories, or host paths they shouldn't touch.

### 4. Secret Leakage

Environment variables inherited from the host shell or copied .env files mean the wrong branch sees the wrong credentials — with no code diff revealing it. This is a security problem that compounds with parallel agents.

### 5. Browser and Cache State

Shared Docker daemon, package caches, and browser profiles mean:

→ Branch A's warm cache hides a missing dependency that branch B fails on
→ Browser sessions from task 1 silently affect task 2's authentication state (OAuth tokens, cookies, localStorage)
→ All localhost ports share one origin — cookies, localStorage, and service workers cross-contaminate between projects running on different ports

### 6. Log and Observability Ambiguity

Shared stdout and daemon logs make root-cause analysis unreliable. You can't tell if a test failed because of the code change or because another agent mutated a bind-mounted path seconds earlier.

## Why This Is Worse for Agents Than Humans

Human developers notice anomalies ("this port is taken") and pause. Agents iterate faster and amplify small misunderstandings into compounded failures. An agent keeps executing until it hits an error or guardrail, turning a small runtime collision into cascading test failures, wrong debugging paths, and wasted tokens.

This breaks the feedback loop agents depend on: if the runtime environment is polluted, even correct code changes produce incorrect signals.

## The Current Solution Landscape

### Git Worktrees (Built-in)

→ Isolates per-worktree files (HEAD, index, working directory)
→ Does NOT isolate: ports, databases, caches, secrets, test state
→ Perfect for code-only work; insufficient for live service execution
→ Claude Code ships native worktree support with --worktree / -w flag (Feb 2026)
→ Cline Kanban uses worktrees for task isolation

### Docker Compose (Per-Project)

→ Creates separate networks by project name
→ Handles service-to-service networking
→ Still fails at: host port publishing, bind mounts, logical topology awareness
→ No concept of "this project vs that project" — only "this compose file vs that compose file"

### Dev Containers

→ Standardize one containerized workspace
→ NOT designed for many headless, branch-parallel runtimes running simultaneously
→ Good for onboarding; bad for multi-agent orchestration

### Container Use (Dagger, open-source)

→ Gives each agent its own containerized sandbox + Git worktree
→ Full filesystem isolation inside containers
→ Still requires explicit port mapping and volume management

### Coasts (CLI + UI, emerging)

The most complete solution found in research. Key model:

Configuration (Coastfile in TOML):

→ Points to existing docker-compose.yml or defines services directly
→ Declares ports, assign policies, shared services, and secret injection
→ Explicit rather than magical

Port Model:

→ Each instance gets dynamic ports in a high range (e.g., localhost:35000)
→ The checked-out instance also receives canonical ports (e.g., localhost:3000)
→ All instances stay reachable; one acts as "the normal environment"

Assign Strategies (per-service, on branch switch): | Strategy | Behavior | Use for | |----------|----------|---------| | none | Leave untouched | Databases, caches | | hot | File-watcher hot-reload | Frontend dev servers | | restart | Stop and restart | Backend services with live-mounted code | | rebuild | Full Docker rebuild | Code baked into image |

Shared Services: Explicitly declare which services remain shared across branches (Redis, package proxies) vs isolated per-branch.

Secret Handling: Extract from host → encrypt locally → inject as environment variables or files per-instance.

Observability (Coastguard UI): Running projects, instances, checkout state, port mappings, live logs, runtime stats, volume/secret metadata.

### Google Scion (April 2026)

→ Manager-Worker architecture
→ Agents run as isolated containers using Claude Code, Gemini CLI, or Codex
→ Manages agent lifecycles and project workspace
→ Still early — Google's infrastructure play

### Worktree CLI (AirOps)

→ Spins up isolated development environments on demand
→ Each environment gets its own ports, databases, and URL
→ Higher-level abstraction over Docker + worktrees

## What Runtime Isolation Must Control (5 Dimensions)

### 1. Parallel vs. Canonical Access

Every instance needs a unique, always-reachable address. The checked-out instance also gets canonical ports (familiar localhost addresses). This lets agents test multiple branches without relearning port numbers.

### 2. Service Behavior on Branch Changes

Different services react differently to code changes:

→ Frontend watchers: hot-reload
→ Backends: need restart
→ Baked-image services: require rebuild
→ Databases: should stay untouched

Configuration must specify this explicitly per service.

### 3. Data Topology Policy

Separate which persistent state isolates per-instance (databases) from what shares (package caches). The policy layer above Docker primitives.

### 4. Secrets as Explicit Environment

Secrets must be extracted, encrypted at rest, and injected per-instance — not inherited from host shell or copied carelessly. The question: "which credentials exist in which runtime, in which form, with which lifetime."

### 5. Observability Surface

Once you parallelize runtime, visibility becomes part of correctness. A single interface showing projects, instances, statuses, port mappings, logs, volumes, and secrets metadata prevents operators from losing track of parallel state.

## The Multi-Agent Coordination Ceiling

From Augment's Multi-Agent Coding Workspace guide:

"For most teams, 3-4 parallel agents is a practical ceiling when a single reviewer integrates results, because conflict resolution and semantic review become the limiting step."

"Gains are bounded by review and verification throughput, not raw generation speed."

Four failure modes of uncoordinated parallel agents:

→ Merge conflicts at shared files
→ Duplicated implementations across branches
→ Semantic contradictions that pass linting (e.g., timezone handling works in UTC but fails at DST boundaries)
→ Context exhaustion on large codebases

Key data point: Frontier models score 70%+ on single-issue tasks but drop below 25% on multi-file patches averaging 107+ lines across 4+ files. Task decomposition into bounded, testable units is not optional — it's the difference between 70% and 25% success rates.

## The Orchestrator Comparison

Four tools now compete at the orchestration layer. Each operates at a different architectural level:

| Tool | Layer | Philosophy | Setup Time | CI Failure Handling | Review Comment Handling | |------|-------|-----------|-----------|-------------------|----------------------| | Agent Orchestrator (AO) | Orchestration | Full lifecycle automation. Give it a GitHub/Linear issue → spawns agent → opens PR | ~10 min | Auto-fetches CI logs, routes to agent, retries, then escalates | Forwards review to agent for incremental fixes | | T3 Code | Interaction | Human-in-the-loop at every edit. Visual diffs. Desktop GUI. | ~2 min | None | Manual | | OpenAI Symphony | Orchestration | Erlang/OTP supervision trees. Process-level fault tolerance. | ~30-60 min | Agent provides proof-of-work; checks must pass before PR completion | Destructive: closes PR, creates new branch, re-implements from scratch | | Cmux | Terminal | Native GPU rendering, Unix socket automation, notification rings. | ~1 min | None | None |

The gap T3 Code reveals: When a user reviews code in T3 Code, that feedback is stuck in the UI — no way for another coding agent to pick up review comments and act on them. This is an MCP server problem waiting to be solved.

AO's recovery model: Polling-based detection (~30s intervals). Slower than OTP supervision but works across any CI provider. 8 plugin slots (runtime, agent, workspace, tracker, SCM, notifier, terminal, lifecycle).

Symphony's recovery model: Erlang/OTP supervision trees provide transparent restart recovery on crashes — designed for production fault tolerance. But Linear-only, no GitHub/Jira support.

The missing layer: None of these tools solve runtime isolation. They all assume the environment is clean. They coordinate agents and code but not the running services those agents interact with.

## What This Means for Atelier Capsules

The original PRD described Capsules as: "Project = git worktree + dev server + browser profile + terminal + port. One-click switch."

The research confirms this is the right target but reveals the implementation is deeper than a UI concept:

### What Capsules Must Actually Solve

→ Port allocation — automatic dynamic port assignment per project instance, with canonical ports for the active project
→ Service lifecycle — per-service assign strategies (hot-reload, restart, rebuild, none)
→ Data isolation — per-project database instances or at minimum per-project schemas
→ Secret injection — explicit, encrypted, per-project credential management
→ Browser profile isolation — separate cookie/localStorage/ServiceWorker contexts per project
→ Observability — single dashboard showing all running projects, their agents, port mappings, and health

### What Capsules Should NOT Solve (delegate)

→ Container orchestration (Docker handles this)
→ CI/CD (Vercel/Railway/GitHub Actions handle this)
→ Code isolation (Git worktrees handle this)
→ Agent orchestration (the Board handles this)

### The Implementation Spectrum

| Approach | Complexity | Isolation Level | Speed | |----------|-----------|----------------|-------| | Port-only | Low | Minimal. Solves the most visible problem. | Fast to ship | | Port + browser profile | Medium | Covers the two biggest developer complaints | Medium | | Full container per project | High | Complete isolation including databases | Slow, heavy | | Coasts-style hybrid | Medium-High | Configurable per-service, explicit policies | The right architecture |

Recommendation: Start with port-only in v1 prototype (prove the project-switching UX), design the data model for full Coasts-style hybrid (so you don't have to rewrite), ship full isolation in v3 as originally planned.

## Security Boundaries That Still Matter

Runtime isolation reduces operational interference, not all security risk:

→ Docker-in-Docker with --privileged mode is not hardened against escapes
→ Container runtimes have had escape-class CVEs (runc CVE-2024-21626, BuildKit CVE-2024-23651)
→ Bind mounts expose host directories with write access by default
→ Agents reading untrusted content (code comments, docs, issues) can be hijacked via prompt injection
→ Secret oversharing: wrong process receives credentials if injection isn't explicit

Practical mitigations:

→ Keep runtime stack updated; don't overstate containment guarantees
→ Prefer read-only mounts where possible
→ Gate untrusted content; require review for sensitive agent actions
→ Make secret injection explicit and auditable per-runtime

Isolation — the unsolved plumbing