The complete landscape of how software gets built in the age of AI agents. Written to make the domain commitment conscious, not assumed.
Last updated: 2026-04-13
## Why this document exists
The existing Atelier (A product I'm building) docs (overview.md, prd.md) jump from "agents fragment the dev workflow" to a specific UI solution — morphing cards, a board, a memory rail. That's building from assumptions. Before writing code, we need to map the full terrain:
- → The full software development cycle — every phase, how agents participate, where it breaks
- → Alternative paradigms — apps aren't the only way to solve problems
- → User segments — who builds software, how, and what they actually need
- → The deep sub-problems — zooming into each phase with real data
- → Where Atelier fits — a conscious choice, not a default
The goal: become the most informed person in the room about agentic developer workflows.
# Part 1: The Full Software Development Cycle
Every phase of building software, how agents currently participate (April 2026), and where the workflow breaks.
## Phase 1: Ideation / Problem Discovery
What happens: "I have an idea" → "I understand the problem well enough to start."
How agents participate today:
- → Chat with Claude/GPT/Gemini to brainstorm, validate ideas, explore markets
- → AI-assisted market research (search, summarize competitors, find data)
- → "Rubber duck" conversations that clarify thinking
Tools: | Tool | What it does | Limitations | |------|-------------|-------------| | ChatGPT / Claude / Gemini | Conversational brainstorming, market research | Conversations die. No structured output. No memory across sessions. | | Perplexity | Research with citations | Good for facts, bad for synthesis. No project continuity. | | Notion AI | In-doc brainstorming | Locked inside Notion. No agent orchestration. |
Where it breaks:
- → Conversations are disposable. You have a 45-minute brainstorm with Claude about a sports drink landing page. The next day, that context is gone. You start over.
- → No structured capture. The insight from the brainstorm doesn't flow into the next phase. There's no "here's what we know so far" state that carries forward.
- → No research memory. You research competitors on Monday, pricing models on Tuesday, user pain on Wednesday. By Thursday, the Monday research is in a different chat thread and effectively lost.
- → AI can brainstorm but can't validate. The model will enthusiastically agree that your idea is great. It won't push back with "but here's why this failed when 3 other people tried it" unless you explicitly ask.
What "solved" looks like:
- → Research from ideation carries forward as structured context into planning and building
- → The agent remembers what was explored, what was rejected, and why
- → Validation is built into the process — not an afterthought
## Phase 2: Research
What happens: Market research, competitive analysis, domain knowledge gathering, technical feasibility assessment.
How agents participate today:
- → Web search and summarization (Perplexity, Claude web search)
- → Document analysis (upload PDFs, analyze spreadsheets)
- → Competitive landscape mapping
Tools: | Tool | What it does | Limitations | |------|-------------|-------------| | Perplexity | Search + cite | No project context. Each query is isolated. | | Claude (web search) | Research with reasoning | Session-bound. Context dies. | | Deep Research (Gemini/OpenAI) | Long-form research reports | One-shot. Can't iterate on findings. | | Elicit / Consensus | Academic research | Narrow domain. Not for market/product research. |
Where it breaks:
- → The sports drink problem. If someone asks an agent to build a landing page for a sports drink, the agent starts coding immediately. It doesn't know that sports drink marketing uses specific copy patterns, specific imagery, specific social proof structures. It needs domain context first — and no tool structures this gathering phase.
- → Research is scattered. Findings live in 12 different chat sessions, 4 browser tabs, 2 Google Docs, and your head. No single "here's everything we know" artifact.
- → No synthesis across sources. An agent can summarize one article. It can't synthesize 15 sources into a coherent understanding with conflicting viewpoints resolved.
- → No "research state." There's no concept of "research in progress" vs "research complete" vs "research needs updating." It's either a conversation or nothing.
What "solved" looks like:
- → A persistent, evolving research artifact that agents and humans both read and write
- → Source tracking with provenance (where did this fact come from?)
- → Synthesis that resolves conflicts between sources
- → Research that flows into design and build decisions as structured context
## Phase 3: Planning / Architecture
What happens: Break the project into tasks, choose tech stack, design the system architecture, estimate scope.
How agents participate today:
- → Claude Code
/initgenerates CLAUDE.md with project context - → Cursor rules files define project conventions
- → Agents decompose intent into tasks (Claude Code plan mode, Cline Plan/Act modes)
- → Architecture suggestions based on requirements
Tools: | Tool | What it does | Limitations | |------|-------------|-------------| | Claude Code (plan mode) | Task decomposition, architecture suggestions | Decomposition ≠ human mental model. Can't negotiate. | | Cursor (composer) | Multi-file planning | Plans inside the IDE only. No broader project view. | | Cline (Plan/Act) | Explicit plan-then-execute loop | Good pattern but VS Code-bound. | | Linear / Jira + AI | Issue generation from specs | Disconnected from the code. Manual sync. | | Kiro (AWS) | Spec-driven development with "vibes" | New (2026), unproven at scale. |
Where it breaks:
- → Agent decomposition ≠ human mental model. An agent asked to "build a journaling app" might decompose into database schema → API routes → frontend components. A human thinks: onboarding flow → core writing experience → weekly summary feature. The granularity and framing don't match.
- → Over-decomposition or wrong abstractions. Agents tend to create too many files, too many abstractions, too early. They build for hypothetical scale instead of current needs.
- → No negotiation. When you disagree with the plan, there's no structured way to say "keep tasks 1-3, replace task 4, and add a new task 5." You restart the conversation.
- → Spec-driven development is nascent. Anthropic's report notes that developer acceptance of agent changes is 89% when the agent provides a diff summary vs 62% for raw output — specs matter, but the tooling for spec-driven workflows barely exists.
- → Plans don't survive contact with reality. The plan is made, work starts, the plan changes — but the original plan doesn't update. There's no living document that reflects the actual state of the project.
What "solved" looks like:
- → Plans that are negotiable — humans and agents collaborate on decomposition
- → Plans that are living — they update as work progresses
- → Plans that carry context — each task knows why it exists and what it depends on
- → Spec-driven workflows where the spec is the source of truth
## Phase 4: Design
What happens: UI design, UX design, interaction design, animations, and the emerging discipline of AU (Agentic User) design — how agents interact with the product being built.
### Sub-discipline: UI Design (visual, layout, components)
How agents participate today:
- → v0, Lovable, Bolt generate UI components from natural language
- → Figma AI generates layouts and auto-layouts
- → Cursor/Claude Code write component code directly
Where it breaks:
- → The taste gap. When average output becomes easier to generate, taste becomes more valuable. AI can produce "decent" in seconds, but decent becomes invisible. The work that stands out is intentional, restrained, coherent, and emotionally tuned. (Source: Figma 2025 AI Report — 78% of designers say AI boosts efficiency, but fewer than half say it makes them better at their jobs.)
- → Polish is no longer optional. AI raises the floor but doesn't raise the ceiling. Every competitor can generate "good enough" UI. Differentiation now requires craft that AI can't produce autonomously.
### Sub-discipline: UX Design (flows, information architecture)
How agents participate today:
- → Limited. Agents can generate individual screens but struggle with flow coherence.
- → Flowstep, UX Pilot, Motiff handle some UX automation.
- → Teams using AI UI tools ship features 40-60% faster — but this is mechanical speed, not design quality.
Where it breaks:
- → Agents think in screens, not in journeys. Ask an agent to design a checkout flow and it generates a checkout page. It doesn't think about the flow from cart → shipping → payment → confirmation → error recovery → email confirmation.
- → No user model. Agents don't have a mental model of the user. They can't reason about cognitive load, information hierarchy, or emotional state through a flow.
- → Context limits kill UX consistency. By the time you're designing screen 8 of a flow, the agent has forgotten the design decisions from screen 1-3.
### Sub-discipline: Interaction Design & Animation
Where it breaks:
- → This is almost entirely unaddressed by current tools. AI can write Framer Motion code, but it can't judge whether a 200ms spring vs a 300ms ease-out feels right for a card expansion.
- → No animation reasoning. Agents have no sense of timing, weight, or spatial continuity. Every animation suggestion is generic.
- → The craft moat. This is where human design engineers have the strongest advantage. AI accelerates everything else; interaction design remains deeply human.
### Sub-discipline: AU (Agentic User) Design — EMERGING
What this is: Designing how AI agents interact with the product you're building. Not how you use AI to build — how AI uses what you built.
Why it matters (2026 data):
- → 51% of Figma users working on AI products are building agents (up from 21% in 2025).
- → MCP has 10,000+ active public servers. The question "how does an agent use this?" is now a first-class design concern.
- → The "first user" of many new products is increasingly an AI agent, not a human.
Where it breaks:
- → No design patterns exist yet. There are no established patterns for "how should an API surface be designed so agents can use it well?" This is green field.
- → Agent-first vs human-first is a real tension. Designing for agents (structured data, typed endpoints, clear contracts) and designing for humans (visual, forgiving, explorable) require different approaches. Most teams don't even recognize this as a design decision.
UPDATE (April 2026): Patterns ARE now emerging. The Smashing Magazine framework (Feb 2026) codifies six patterns: Intent Preview, Autonomy Dial, Explainable Rationale, Confidence Signal, Action Audit & Undo, and Escalation Pathway. The A2UI protocol provides a technical spec for agent-generated interfaces via declarative JSON. Microsoft Design published UX principles for agents. The field is crystallizing fast — but it's still designer-authored patterns, not agent-generated design. See deep-dive-agentic-ux.md for the full analysis.
What "solved" looks like:
- → Design tools that understand flows, not just screens
- → Animation and interaction tools that let you tune feel, not just configure properties
- → Agent-first design as an explicit practice with patterns and principles (now partially exists)
- → Taste and judgment preserved as the human layer over AI-generated foundations
- → Structured design specs that agents can read and implement (Figma MCP is the first step; see below)
## Phase 5: Building (Frontend)
What happens: Components, pages, state management, styling, responsive design.
How agents participate today:
- → Cursor, Claude Code, Copilot, Cline write code inline or in agent mode
- → v0/Lovable/Bolt generate full frontends from prompts
- → Multiple agents run in parallel across files/features
Tools (2026 landscape): | Tool | Type | Strengths | Limitations | |------|------|-----------|-------------| | Cursor | IDE (VS Code fork) | Deep codebase awareness, Agent mode, multi-file edits | Silently reverted code (March 2026 bug). Performance degrades on large projects. IDE lock-in. $20-200/mo. | | Claude Code | CLI | Terminal-native, subagents, Agent Teams, CLAUDE.md memory | Context window limits (~200k tokens). Memory loss between sessions. Sessions average 23min before context issues. | | GitHub Copilot | IDE extension | Largest user base, Coding Agent (async), integrated with GitHub | Lower quality than Cursor/Claude for complex tasks. | | Cline | VS Code extension | Free, open-source, 5M+ developers, Plan/Act modes, MCP | VS Code-bound. Kanban is separate app. | | Windsurf | IDE | Parallel agents, Cascade flow system | Newer, smaller ecosystem. | | Lovable | Web platform | First-prompt magic, fast to deploy | React-only. Supabase/Stripe only. Falls apart at month 3. | | v0 | Web platform | Vercel ecosystem, good component generation | Best for components, not full apps. | | Bolt | Web platform | Fast prototyping | Inconsistent code quality. Cost escalation on complex projects. | | Replit Agent | Cloud IDE | Zero-to-deployed fastest path, no local setup | Agent 3 hallucinates library syntax. "Competent junior developer" output. Effort-based pricing is unpredictable. | | Devin | Autonomous agent | 67% PR merge rate on defined tasks. $20/mo. | 15% success on complex tasks without help. Senior at understanding, junior at execution. |
Where it breaks — the multi-agent fragmentation problem (Theo's "Agentic Code Problem"):
This is the most-documented pain point in the current landscape. Real data:
- →
Project-app fragmentation. Terminal, editor, and browser are one "project" in your head, but the OS has no concept of that grouping. Running 3-5 parallel agents across projects shatters the mental model.
- →
"Which one dinged?" Long-running agents notify with no origin label. You spelunk tabs to find who finished.
- →
Localhost port collisions. App storage and cookies are shared across ports on
localhost— one origin. OAuth redirects hardcoded to :3000 break when project B lands on :3001. - →
Context overflow. 78% of Claude Code sessions now involve multi-file edits (up from 34% in Q1 2025). Average session length is 23 minutes. Context gets "poisoned" as sessions grow — the model starts repeating instructions, making mistakes it was getting right earlier, producing code that contradicts earlier decisions.
- →
Memory loss between sessions. Claude Code starts every session with zero context. CLAUDE.md is a workaround for stable project info, but dynamic session context (specific decisions made an hour ago, current debugging state) is lost.
- →
Compound mistakes at scale. With humans coding slowly, pain surfaces early. With orchestrated agents, small mistakes — code smells, duplication, unnecessary abstractions — compound faster than humans catch them.
- →
Code review becomes the bottleneck. LinearB's 2026 analysis of 8.1M pull requests across 4,800+ organizations: developers feel 20% faster but are actually 19% slower. Agentic AI PRs have a pickup time 5.3x longer than unassisted ones. AI-generated PRs carry ~1.7x more issues. High-adoption multi-agent teams merge 98% more PRs but have 91% longer code review times and 154% larger PR sizes.
Orchestration patterns that are emerging:
From Addy Osmani's "Code Agent Orchestra" and Anthropic's 2026 Agentic Coding Trends Report:
| Pattern | How it works | Strengths | Weaknesses | |---------|-------------|-----------|------------| | Subagents | Parent decomposes, spawns specialized children with file ownership | Simple, cost-neutral (~220k tokens) | Manual coordination. No shared state. | | Agent Teams | Shared task list, dependency tracking, peer-to-peer messaging | True parallelism. Self-claiming tasks. | Sweet spot is 3-5 agents. Beyond that, coordination overhead explodes. | | The Ralph Loop | Atomic tasks in a loop: pick → implement → validate → commit → reset context → repeat | Avoids context overflow. Maintains continuity through external memory. | Slower. Only works for decomposable tasks. | | Planner / Worker / Judge | Planners explore + create tasks. Workers execute. Judges evaluate. | Clean separation. Avoids conflicts. | Cursor tried and failed with equal-status agents — this three-role model works better. |
Tools specifically for orchestration:
- → Cline Kanban — CLI-agnostic kanban for multi-agent orchestration. Dependency chains. Worktree isolation. Free, open-source. User feedback (April 2026): requests for light theme, task folders, GitHub issue sync, expense tracking per card, split view, and complexity-based task routing. The product is in research preview.
- → T3 Code — Theo's open-source desktop app as a UI layer over Claude Code/Codex. Multi-repo, multi-agent parallelism. Git worktree integration. Key gap: review feedback is stuck in the UI — no way for another agent to act on review comments (MCP server problem waiting to be solved).
- → Agent Orchestrator (AO) — Full lifecycle automation: give it a GitHub/Linear/Jira issue → spawns agent in worktree → opens PR → handles CI failures → routes review comments back to agents. 8 plugin slots. Polling-based recovery (~30s).
- → OpenAI Symphony — Erlang/OTP supervision trees for fault-tolerant agent orchestration. Linear-only. Destructive review handling (closes PR, reimplements from scratch). Engineering preview.
- → Google Scion (April 2026) — Manager-Worker architecture. Agents run as isolated containers using Claude Code, Gemini CLI, or Codex.
- → Claude Code Agent Teams — Built into Claude Code. Shared task lists, peer messaging, automatic dependency resolution.
- → Vibe Kanban — Similar to Cline Kanban. Kanban board for orchestrating coding agents.
See deep-dive-runtime-isolation.md for the full orchestrator comparison and the runtime isolation problem none of these solve.
What "solved" looks like:
- → Project as a first-class container (git worktree + dev server + browser profile + terminal grouped as one unit)
- → Ambient awareness of all running agents without attention hijack
- → Context that persists across sessions and survives compaction
- → Code review integrated into the agent workflow, not bolted on after
## Phase 6: Building (Backend)
What happens: APIs, database design, authentication, business logic, infrastructure.
How agents participate today:
- → Same tools as frontend (Cursor, Claude Code, Copilot, Cline)
- → Devin handles migrations, framework upgrades, tech debt cleanup (67% PR merge rate)
- → Agents scaffold APIs, write database schemas, generate auth flows
Where it breaks (in addition to all frontend problems):
- → Agents don't understand deployment context. An agent writes perfect API code that assumes environment variables exist that don't. Or it uses a database connection string format that works locally but not in production.
- → Secrets management is dangerous. Agents sometimes hardcode API keys, expose credentials in logs, or create insecure auth flows. 40-62% of AI-generated code contains security vulnerabilities.
- → Database operations are irreversible. Devin's core risk: "does not always surface uncertainty or flag dangerous actions. Human review remains mandatory for any destructive or irreversible database operations."
- → Schema design requires foresight agents lack. Agents optimize for the current feature. They don't think about "what happens when we add feature X in 3 months and this schema decision makes it impossible?"
What "solved" looks like:
- → Agents that understand the deployment target (Vercel, Railway, AWS, etc.) as context
- → Guardrails for destructive operations (database drops, data mutations)
- → Schema design that considers future requirements, not just current prompt
## Phase 7: Testing
What happens: Unit tests, integration tests, QA, visual regression, performance testing, accessibility testing.
How agents participate today:
- → Agents generate unit tests alongside code (Cursor, Claude Code)
- → Agentic testing: AI autonomously navigates an app and reports outcomes from natural language descriptions
- → Visual regression tools use reinforcement learning to detect UI bugs (distinguish real bugs from harmless rendering variations)
- → AI-generated tests cover 60-80% of standard user paths
Tools (2026 landscape — four categories):
| Category | Tools | What they do | Limitations | |----------|-------|-------------|-------------| | Agentic Automated | QA Wolf, Checksum | Generate production-grade Playwright/Appium code from prompts; AI-driven maintenance | Still requires prompt engineering; coverage reflects test author's imagination | | Agentic Manual | Mabl, Testim, Functionize | Computer-use agents navigate app like manual testers; self-healing locators | Non-deterministic; can't run in parallel; token-expensive; results can't be verified reliably | | Visual Regression | Applitools ($969+/mo), Percy ($199+/mo) | AI-powered screenshot comparison against baselines; cross-browser/device validation | Detect changes but can't evaluate if changes are GOOD — flag "different" not "wrong" | | IDE Copilots | Claude Code, Cursor, Copilot | Generate test code alongside implementation | Team owns execution, CI, maintenance; scaffolding only | | AI Code Review | CodeRabbit, Qodo | Automated PR review for patterns, bugs, security | Catches patterns but not architectural issues or design intent |
Forrester renamed this category to "Autonomous Testing Platforms" in 2025. Gartner predicts 40% of enterprise apps will feature task-specific AI agents by end of 2026 (up from <5% in 2025).
Where it breaks:
- → Generated tests ≠ quality tests. AI-generated tests typically cover the happy path but miss: edge cases, error recovery, race conditions, state management bugs, accessibility issues.
- → No "does this feel right" test. No tool can tell you if a 200ms animation feels sluggish or if a loading state is anxiety-inducing. This is taste testing. Visual regression tools (Applitools, Percy) detect CHANGES but can't evaluate if changes are GOOD. They flag "different" not "wrong."
- → The visual regression gap. AI code generation is now standard in frontend development, creating urgency: the gap between "code review passed" and "page looks correct" grows. Best current approach: layer Applitools on top of QA Wolf's Playwright tests — deterministic execution + AI-powered visual validation. Still requires human review of baselines.
- → Testing agents need the same context as building agents. A test agent needs to understand the full system to write meaningful integration tests. But agents lose context. So tests are shallow.
- → Non-deterministic agentic manual testing. Computer-use agents that navigate apps like manual testers (Mabl, Functionize) can't run in parallel, are token-expensive, and produce results that can't be reliably verified. This category is promising but immature.
What "solved" looks like:
- → Tests that are written with full system understanding, not just local function knowledge
- → Visual and interaction testing that catches "feels wrong" not just "looks wrong"
- → Testing integrated into the agent workflow (agents that test their own work before submitting)
- → Quality gates: hooks that force agents to keep working until tests pass (Addy Osmani's pattern)
## Phase 8: Deployment
What happens: CI/CD pipelines, hosting, domain configuration, monitoring, alerting.
How agents participate today:
- → Vercel/Railway/Netlify handle most deployment mechanics for frontend
- → Agents can write Dockerfiles, CI configs, infrastructure-as-code
- → AI reduces MTTR by up to 65% and increases deployment frequency by 40%
- → AI predicts build failures, identifies flaky tests, automates rollbacks
Tools: | Tool | What it does | Limitations | |------|-------------|-------------| | Vercel / Railway / Netlify | One-click deploy from git | Simple apps only. Complex infra needs escape hatches. | | Pulumi / Terraform + AI | Infrastructure as code with AI assist | Learning curve. Agents make IaC mistakes that are hard to catch. | | GitHub Actions + AI | CI/CD automation | Agents can write workflows but debugging failing pipelines is still manual. |
Where it breaks:
- → Agents build but don't verify in production. An agent writes code, tests pass locally, deploys to staging — but nobody checks if the production deployment actually works. The loop isn't closed.
- → 47% of teams cite manual approvals as a bottleneck. Security and compliance gates are still built for a ticket-driven world, not an agent-driven world.
- → CI/CD pipelines aren't designed for agent-generated code at scale. Secrets management, sandboxing, and execution environments need rethinking for a world where agents generate and deploy code autonomously.
- → Deployment is mostly solved — for simple cases. Vercel + Supabase makes deploying a Next.js app trivial. But the moment you need custom infrastructure, the agent can't help.
What "solved" looks like:
- → End-to-end: agent builds → tests → deploys → verifies in production → alerts on issues
- → Deployment as part of the agent workflow, not a separate manual step
- → Security and compliance gates that work at agent speed, not ticket speed
## Phase 9: Iteration / Maintenance
What happens: Bug fixes, feature additions, user feedback processing, performance optimization, dependency updates.
How agents participate today:
- → Devin handles tech debt, dependency upgrades, migrations (67% merge rate on defined tasks)
- → Claude Code / Cursor fix bugs from issue descriptions
- → AI PR review catches regressions
Where it breaks:
- → The month-3 problem. This is the most consistent failure mode across all AI-assisted building:
- → Lovable/v0/Bolt: "first prompt is magic, month 3 is a chat box pointed at chaos"
- → Context windows fill. The agent doesn't know what was built or why.
- → Decisions made in week 1 are invisible by week 8.
- → The "why" is lost — code exists but the reasoning behind it is gone.
- → Context loss is cumulative. Each session starts fresh. CLAUDE.md captures static rules but not dynamic decisions. The gap between "what the codebase does" and "why it does it that way" grows with every session.
- → Agents can't process user feedback. A user reports "the checkout flow feels clunky." An agent can't interpret this. It needs a human to translate subjective feedback into specific technical changes.
- → Dependency rot is accelerating. In an AI-fast world, dependencies update faster, breaking changes happen more often, and agents introduce dependencies without considering long-term maintenance burden.
What "solved" looks like:
- → Project memory that persists across weeks and months — decisions, reasoning, context
- → Agents that can pick up a codebase 3 months in and understand not just what but why
- → Feedback loops from users → agents → code changes that preserve intent
- → Maintenance as a first-class concern, not an afterthought
## Phase 10: The Handoff Problem (between phases)
This is the meta-problem that compounds all the others.
Every phase transition loses context:
| Transition | What's lost | |-----------|-------------| | Ideation → Research | The brainstorm context. Why this direction was chosen. What alternatives were considered. | | Research → Planning | Research findings. Market context. Competitor analysis. Domain knowledge. | | Planning → Design | The reasoning behind the plan. Constraints that shaped scope. | | Design → Building | Design decisions. Why this layout, not that one. Interaction specifications. | | Building → Testing | What the code is supposed to do. Edge cases the builder was thinking about. | | Testing → Deployment | Test results. Known issues. Configuration requirements. | | Deployment → Iteration | Everything from every prior phase. The full "why" of the project. |
No tool currently addresses this. Every tool is phase-specific. Cursor is for building. Linear is for planning. Figma is for design. Perplexity is for research. The transitions between them are manual copy-paste or, more often, re-explanation to a new agent session.
UPDATE (April 2026): The design-to-code handoff is being rebuilt. Figma's MCP server (Feb 2026) lets agents read native Figma properties — variables, design tokens, components, auto layout rules — directly. OpenAI Codex and Anthropic Claude Code both integrate via MCP. But even with MCP, the handoff still breaks: generated code ignores design systems, extracts primitive values instead of tokens, can't handle responsive breakpoints, and misses interaction states. The gap is narrowing but not closed. See deep-dive-agentic-ux.md Part 3 for the full analysis.
What "solved" looks like:
- → A persistent project context that flows across phases
- → Each phase adds to the context rather than starting from scratch
- → Agents in phase N can access decisions and reasoning from phases 1 through N-1
- → The "why layer" — not just what was done, but why
# Part 2: Alternative Paradigms
Apps are not the only way to solve problems or complete tasks. This matters because Atelier needs to be conscious about what paradigm it is — and what it's not.
## Paradigm 1: Traditional Code (human-written)
What it is: Developer writes code in an editor, runs it, tests it, deploys it.
Still the best for: Complex systems requiring deep architectural reasoning. Performance-critical code. Security-sensitive applications. Anything where the developer needs to understand every line.
Dying for: Boilerplate. CRUD apps. Standard patterns. Anything where "just make it work" is the goal.
Market reality: Still dominant by lines-of-code but declining as a percentage of new code written. AI-assisted code is the new default for new projects.
## Paradigm 2: AI-Assisted Code (agent era)
What it is: Human + AI agent(s) write code together. The human directs; the agent executes.
The spectrum within this paradigm:
| Mode | Human role | Agent role | Tools | |------|-----------|------------|-------| | Autocomplete | Writes code, accepts suggestions | Predicts next tokens | Copilot (original), Codeium | | Pair programming | Directs, reviews inline | Writes blocks of code on request | Cursor (Tab), Claude Code (single prompt) | | Conductor | Directs each action, reviews immediately | Executes multi-step tasks synchronously | Cursor (Agent mode), Claude Code (interactive) | | Orchestrator | Assigns tasks, reviews PRs | Executes autonomously in background | Claude Code Web, Codex Cloud, Devin, GitHub Coding Agent |
Key data point (Anthropic 2026 report):
- → Average session length: 4 min (autocomplete era) → 23 min (agentic era)
- → 78% of sessions involve multi-file edits (up from 34%)
- → 47 tool calls per average session
- → 27% of AI-assisted work consists of tasks that wouldn't have been done otherwise
The tension: As you move from autocomplete → orchestrator, you gain speed but lose understanding. The developer who orchestrates 5 agents across 5 features knows less about the code than the developer who pair-programmed one feature.
## Paradigm 3: No-Code / AI-First Generation
What it is: Describe what you want in natural language → get a working app.
Tools: Lovable, v0, Bolt, Replit Agent
Best for: MVPs, prototypes, landing pages, internal tools, non-technical founders getting to v1.
The hard limits (from comparative research):
- → The 70% problem. Complex business logic, third-party integrations, performance optimization, and proper testing still need a developer. These tools get you 70% of the way fast; the last 30% is where they break.
- → Framework lock-in. Lovable and v0 are React-only. Bolt's code quality is inconsistent.
- → Backend constraints. Lovable and Bolt only integrate with Supabase, GitHub, and Stripe. Custom backends require escape hatches that don't exist.
- → Context window limits. Large projects overwhelm the AI's memory, leading to inconsistent code and forgotten patterns.
- → The month-3 cliff. "First prompt is magic, month 3 is a chat box pointed at chaos." No tool in this category has solved long-running project coherence.
- → Code quality at scale. Generated code has minimal comments, inconsistent patterns, and tightly coupled logic that's hard to maintain.
- → Cost escalation. Token consumption on complex projects makes costs unpredictable.
Who's winning: Lovable (best overall for non-technical users), v0 (best for components within the Vercel ecosystem), Replit (fastest zero-to-deployed path).
## Paradigm 4: Workflow Automation
What it is: Connect existing services and APIs to automate repetitive tasks. No code, no apps — just flows.
Tools: | Tool | Best for | Philosophy | |------|---------|------------| | Zapier | Non-technical users, simple integrations | If-this-then-that at scale. 7,000+ app integrations. | | Make (Integromat) | Visual workflow design | More complex branching logic than Zapier. | | n8n | Developers who want full control | Self-hosted, open-source, code fallback (JS/Python). | | Gumloop | AI-powered task automation | Drag-and-drop canvas for AI workflows. Non-technical friendly. | | Lindy | AI agent workflows | Higher-level abstraction than n8n. |
When this is the right answer (not an app):
- → The user's "task" is connecting existing services (CRM → email → spreadsheet)
- → The workflow is predictable and repeatable
- → No custom UI is needed
- → The value is in the automation, not the interface
Where it fails:
- → Not for building products. Workflow tools connect services but don't create new experiences.
- → Limited to API-glue. If what you need doesn't have an API integration, you're stuck.
- → AI is inside steps, not owning the flow. "Most Gumloop flows work best when the path is mostly planned. The AI helps inside steps but isn't running as a fully independent agent."
- → No product-level memory or judgment. These tools don't understand your product, your users, or your design system.
The Atelier implication: Some tasks in a software project are better solved by workflow automation than by writing code. A smart project studio would know when to suggest "this is a Zapier problem, not a coding problem."
## Paradigm 5: Autonomous Agents (fully autonomous)
What it is: Give an agent a task; it works independently and delivers a result.
Tools: Devin ($20/mo), Claude Code Web, OpenAI Codex Cloud, GitHub Copilot Coding Agent, Google Jules
Best for: Well-defined, bounded tasks: migrations, dependency upgrades, bug fixes from clear issue descriptions, tech debt cleanup.
Real-world data:
- → Devin: 67% PR merge rate on defined tasks. 15% success on complex tasks without help. Senior at understanding, junior at execution.
- → Claude Code Web: Runs in cloud VMs, connects to GitHub repos, works asynchronously.
- → Codex Cloud: Assigns tasks from web UI, returns PRs.
The fundamental limitation: "Ambiguous or exploratory work is where Devin struggles. The fully autonomous model that makes it compelling for defined tasks becomes a liability when requirements are loose or the work requires judgment calls mid-execution."
The Atelier implication: Fully autonomous agents are a tool within the workflow, not the workflow itself. They handle the mechanical execution. The project studio handles the direction, judgment, and coordination.
## Paradigm 6: Agent-Native Platforms (emerging — Atelier's space)
What it is: Platforms designed from the ground up for a world where AI agents are first-class participants in the software development process.
What exists today:
- → Cline Kanban — CLI-agnostic kanban for multi-agent orchestration. Dependency chains. Worktree isolation. Free, open-source. Closest to this paradigm.
- → T3 Code — UI layer over coding agents. Multi-repo, multi-agent parallelism.
- → Vibe Kanban — Kanban board for orchestrating coding agents.
What's missing (the gap Atelier could fill):
- → Full-cycle coverage. All existing tools focus on the building phase. None address research, design, or testing as first-class phases.
- → Workspace morphing. No tool changes its environment based on the type of work. A research task and a backend task get the same UI.
- → Project memory. No tool maintains a persistent "why" layer that carries context across sessions and phases.
- → The handoff problem. No tool structures the transition between phases.
- → Design as a first-class concern. Every existing tool is engineering-first. Design thinking is an afterthought.
# Part 3: User Segments — Full Analysis
## Segment 1: Solo Builder / Indie Hacker
Who they are: Individual developers building side projects, micro-SaaS, MVPs. Often working nights/weekends alongside a day job.
What they're building: Personal tools, SaaS products ($0-10K MRR), client projects, portfolio pieces, open-source libraries.
How they use agents today:
- → Claude Code + Cursor as primary tools
- → 1-3 parallel agent streams
- → Working in terminal windows and browser tabs
- → Manual project switching
Their daily workflow:
- → Open terminal, start Claude Code or Cursor
- → Work on Feature A for 30 min
- → Start Agent on Feature B in another terminal
- → Switch between tabs to check progress
- → Lose context when switching back to Feature A
- → End session. Next day, start over — agent doesn't remember yesterday.
Their biggest pains:
- → Context loss. Every session starts fresh. The agent doesn't know what was decided yesterday.
- → "Which one dinged?" Running 2-3 agents in parallel with no way to know which one finished or needs attention.
- → The finish line. Getting from 80% to 100% is where the work happens. AI gets you to 80% fast, then the remaining 20% takes 80% of the time.
- → No memory = no momentum. They can't pick up where they left off. Every session is a cold start.
- → Taste is their differentiator. As a solo builder, their product needs to stand out. AI makes everyone's product "good enough." Their edge is taste, design, and interaction quality.
What they'd pay for:
- → Speed without losing understanding
- → Memory that persists across sessions
- → A way to run parallel agents without cognitive chaos
- → Tools that help them maintain craft/taste while moving fast
Price sensitivity: High. $20-50/month max for most. Will pay more if the tool obviously saves hours/week.
How many of them exist: Millions globally. The entire Cursor user base (2M+), Claude Code user base, Cline user base (5M+). But the ones running parallel agents are a subset — maybe 100K-500K.
## Segment 2: Startup Team (2-10 people)
Who they are: Early-stage startup engineering teams. 2-5 developers, maybe a designer, a PM. Moving fast, iterating on product-market fit.
What they're building: SaaS products, consumer apps, developer tools, vertical SaaS. Products that need to ship fast and iterate faster.
How they use agents today:
- → Each developer uses their own agent setup (Cursor, Claude Code, Copilot)
- → No coordination between agents across developers
- → PRs pile up unreviewed (code review is the bottleneck)
- → Linear/Jira for project management, disconnected from agent work
Their daily workflow:
- → Standup: discuss priorities
- → Each dev opens their IDE + agent
- → Dev A's agent writes feature code → PR
- → Dev B's agent writes different feature code → PR
- → Both PRs touch the same files → merge conflict
- → Nobody reviews either PR for hours (5.3x longer pickup time for agentic PRs)
- → Agent-generated PRs are 154% larger → review takes longer
- → Ship anyway because deadline
Their biggest pains:
- → Agent outputs conflict. Two developers' agents edit the same file with different assumptions. Merge conflicts are more frequent and harder to resolve because nobody wrote the conflicting code.
- → No shared context. Developer A's agent doesn't know what Developer B's agent decided about the API schema. Each agent operates in isolation.
- → PRs pile up unreviewed. The code review bottleneck is real: 91% longer review times with multi-agent workflows. Teams feel 20% faster but are actually 19% slower (LinearB data).
- → Quality drift. AI-generated PRs carry 1.7x more issues. With a small team moving fast, nobody catches the accumulated code smells until the codebase is unmaintainable.
- → Architectural coherence. With 3-5 agents writing code independently, architectural decisions aren't enforced. Each agent makes locally optimal choices that globally conflict.
What they'd pay for:
- → Coordination between agents across the team
- → Shared project context that all agents access
- → Code review at agent speed (AI reviews + human oversight)
- → Architectural coherence enforcement
Price sensitivity: Medium. $50-200/month per seat is standard for dev tools at this stage. Value must be clear.
How many of them exist: Hundreds of thousands of early-stage startups globally. The ones using AI agents intensively are a rapidly growing subset.
## Segment 3: Agency / Freelancer
Who they are: Design/development agencies and freelancers handling multiple client projects simultaneously.
What they're building: Client websites, apps, branding, marketing sites, internal tools. 3-10 active projects at once.
How they use agents today:
- → Agents for speed — ship client work faster
- → 2-3 tools (usually Cursor + Claude Code for complex work, Lovable/v0 for quick prototypes)
- → Manual project switching between client directories
Their daily workflow:
- → Morning: work on Client A's feature using Cursor Agent mode
- → Client B emails: "urgent bug"
- → Switch to Client B's project. Agent needs full re-briefing.
- → Fix the bug. Switch back to Client A. Context is gone.
- → Afternoon: start Client C's prototype in Lovable
- → Client A's agent finished something in the background — didn't notice
- → End of day: manage 4 sets of localhost ports, env files, git branches
Their biggest pains:
- → Multi-project fragmentation at scale. The solo builder's problem, but multiplied by 5-10 active clients.
- → Port/auth collisions. localhost:3000, :3001, :3002 — cookies are shared across ports, OAuth redirects break, client A's session bleeds into client B's dev server.
- → Client context switching. Each client project has different design systems, tech stacks, conventions, stakeholders. Agents don't know any of this when you switch.
- → Time tracking is impossible. When agents do work asynchronously, how do you bill for it? The agent worked for 20 minutes on Client A while you were doing Client B. How does that show up on the invoice?
- → Quality variance. Some clients need pixel-perfect craft; others just need "good enough, fast." The agent doesn't know which mode it's in.
What they'd pay for:
- → Project isolation (one-click context switch between client projects with full env isolation)
- → Client-specific agent memory (design system, conventions, stakeholder preferences)
- → Time tracking integration that accounts for agent work
- → Different quality modes for different clients
Price sensitivity: Medium-high. Agencies will pay $100-300/month for tools that clearly increase billable output. Freelancers: $30-100/month.
How many of them exist: Millions of freelance developers globally. Tens of thousands of agencies. Those using AI agents intensively: growing rapidly.
## Segment 4: Enterprise
Who they are: Large organizations (500+ employees) with dedicated engineering teams, compliance requirements, security constraints.
What they're building: Internal tools, customer-facing platforms, data pipelines, integrations, microservices.
How they use agents today:
- → Cautious adoption. 84% of developers use AI coding assistance, but enterprise governance is lagging.
- → Approved tool lists (usually Copilot, sometimes Cursor or Claude Code)
- → Security reviews of AI-generated code
- → Compliance requirements for audit trails
Key data (2026):
- → 79% of organizations face challenges adopting AI — a double-digit increase from 2025.
- → 54% of C-suite executives say adopting AI "is tearing their company apart."
- → 67% of executives believe their company has already suffered a data leak due to unapproved AI tools.
- → 36% lack any formal plan for supervising AI agents.
- → 100% of security/IT/risk leaders say agentic AI is on their roadmap — yet most can't stop agents when something goes wrong.
- → 40-62% of AI-generated code contains security vulnerabilities.
Their biggest pains:
- → Governance gap. AI adoption is outrunning governance. Tools are adopted by individual developers before IT approves them. "Shadow AI" is the new "shadow IT."
- → Security vulnerabilities. AI-generated code introduces vulnerabilities at a rate that existing security tools can't handle. SAST/DAST tools weren't designed for the volume and pattern of AI-generated code.
- → Audit trail requirements. Regulations require knowing who wrote what code and why. When an agent writes code, who is accountable? The developer who prompted it? The model provider?
- → Agent containment. Most organizations can monitor what agents are doing but can't stop them when something goes wrong. This governance-containment gap is the defining security challenge of 2026.
- → AI Bill of Materials. Emerging requirement to document which AI models were used, for what, and what data they accessed — similar to SBOMs for dependencies.
What they'd pay for:
- → Control + auditability. Full audit trails for every agent action.
- → Security guardrails. Agents that can't access secrets, can't make destructive changes without approval.
- → Policy enforcement. Approved models, approved actions, approved data access.
- → Compliance-ready reporting. Evidence of responsible AI use for auditors.
Price sensitivity: Low. Enterprises pay $100-500/seat/month for developer tools. Budget is not the constraint; security and compliance are.
How many of them exist: Fortune 500 alone is 500 companies with thousands of developers each. The addressable market is enormous, but enterprise sales cycles are 6-12 months and require features that a startup can't build in v1.
## Segment 5: Non-Technical Founder
Who they are: Founders with an idea and no coding skills. Want to build an MVP to validate, get funding, or launch.
What they're building: SaaS apps, marketplaces, mobile apps, landing pages with functionality. Usually a single product.
How they use agents today:
- → Lovable, v0, Bolt, Replit Agent — whichever gets closest to "describe it, get it"
- → ChatGPT / Claude for everything else (copy, strategy, planning)
- → Often hire a developer when they hit the customization wall
Their daily workflow:
- → Describe what they want to Lovable/v0/Bolt
- → Get a working prototype in 30-60 minutes
- → Iterate with natural language: "make the button bigger," "add a pricing page"
- → Hit a wall: "integrate with Stripe webhooks" or "custom authentication flow"
- → Try to describe the fix. Agent produces broken code.
- → Spend 2 days trying to fix it via prompting.
- → Give up and hire a freelance developer.
Their biggest pains:
- → The customization wall. AI app builders get you to a working prototype fast. But the moment you need something beyond their built-in integrations (Supabase, Stripe), you're stuck. "A technical founder who starts with Lovable will hit customization limits within a week."
- → Can't debug. When something breaks, they can't read the error message. They're entirely dependent on the AI to fix it, and often the AI makes it worse.
- → Taste gap. The generated UI is "good enough" but not "good." They can feel it's not right but can't articulate what's wrong or fix it.
- → The month-3 cliff. Even if v1 works, iterating on it becomes increasingly painful as the codebase grows and the AI loses context.
- → Backend complexity. Frontend generation is relatively solved. Backend logic (auth flows, payment processing, data relationships) is where non-technical founders hit the hardest wall.
What they'd pay for:
- → "Make it actually good" — taste and quality beyond what AI generates by default
- → Getting past the customization wall without learning to code
- → A way to iterate on a product at month 3, not just month 0
- → Handoff to a developer when they need one (with all context preserved)
Price sensitivity: Varies. Willing to pay $50-200/month to avoid hiring a developer ($5K-50K). But extremely sensitive to value — they need to see results immediately.
How many of them exist: Enormous. Millions of aspiring founders globally. Most won't build anything. The ones who try AI builders: rapidly growing. The ones who get past month 3: very few.
# Part 4: Deep Sub-Problems
Zooming into the four highest-priority areas for Atelier.
## Deep Dive 1: The Research Phase Gap
The problem in one sentence: No tool structures the gathering and synthesis of domain knowledge before building begins.
Why this matters for Atelier: The existing PRD skips research entirely. It starts with "drop an idea, get a board." But the quality of what gets built depends entirely on the quality of the context that went into it. "Build me a landing page for a sports drink" produces generic output. "Build me a landing page for a sports drink targeting CrossFit athletes aged 25-35 who value clean ingredients, positioned against Gatorade with a DTC model" produces something worth shipping.
What would a research surface actually look like?
- → Sources (URLs, documents, data) with status tracking (fetched, reading, synthesized)
- → A synthesis document that the agent writes and the human edits
- → Memory that carries forward: "here's what we know about this market"
- → The ability to say "add 3 more skeptical sources" and have the agent update the synthesis
- → The research becoming structured context for the next phase (design, build)
Existing solutions and their gaps:
- → Perplexity: great for single queries, no project continuity
- → Deep Research (Gemini/OpenAI): one-shot reports, can't iterate
- → Notion AI: locked in Notion, no agent orchestration
- → None of them produce structured context that agents in subsequent phases can consume
## Deep Dive 2: The Design Phase Gap
The problem in one sentence: AI accelerates every part of building except the part that makes the result worth using — design taste, interaction feel, and user journey coherence.
Why this matters for Atelier: This is the founder's design engineering edge. If Atelier solves the design gap, it has a moat that engineering-first competitors (Cline, T3 Code, Devin) can't match.
The three layers of design that AI currently fails at:
- →
Taste / judgment. AI generates "decent." In a world where everyone uses AI, decent is invisible. The work that stands out is intentional, restrained, coherent. No tool helps humans apply taste to AI-generated output systematically.
- →
Flow coherence. AI thinks in screens, not journeys. Onboarding → core experience → edge case recovery → return visit — no agent maintains this thread. By screen 8, the agent has forgotten screen 1's decisions.
- →
Interaction feel. Spring constants, timing curves, spatial continuity, weight. This is almost entirely unaddressed by current tools. An agent can write
transition: 200ms ease-in-outbut can't judge whether that feels right for this specific interaction in this specific context.
What would a design surface actually look like?
- → Not Figma. Not a canvas. Not trying to outbuild anyone.
- → A structured view of design decisions: "here's the design system, here's the flow, here's the interaction spec"
- → Agents produce design artifacts (component specs, flow diagrams, token files); humans curate and judge
- → The ability to tune interaction feel (timing, easing, spring physics) in context, not in isolation
- → Design decisions that carry forward as constraints for the building phase
## Deep Dive 3: The Multi-Agent Fragmentation Problem (building phase)
The problem in one sentence: Running 3-5 parallel agents creates fragmentation, context loss, attention hijack, port collisions, and review bottlenecks — and existing tools only solve pieces.
This is the most-documented problem and the current Atelier PRD's primary focus.
The complete set of sub-problems:
| Sub-problem | Severity | Existing solutions | Gap | |-------------|----------|-------------------|-----| | Project-app fragmentation (terminal + editor + browser = one project, OS doesn't know) | High | T3 Code (multi-repo UI), Cline Kanban (task board) | No tool groups all project artifacts as a single container | | "Which one dinged?" (agent finishes, no origin label) | High | Cline Kanban (visual status), T3 Code (multi-agent view) | Solved at the task level, not at the ambient awareness level | | Localhost port collisions | High | None | Completely unsolved in software. Capsules (Atelier concept) addresses this. | | Context overflow within sessions | High | Claude Code auto-compact, CLAUDE.md | Compaction loses dynamic context. CLAUDE.md is static only. | | Memory loss between sessions | High | CLAUDE.md, memory files | Static workaround. No dynamic session memory that persists. | | Code review bottleneck (5.3x longer pickup, 1.7x more issues) | High | CodeRabbit, Qodo (AI review) | AI review catches patterns but not architecture. Human review still mandatory. | | Compound mistakes at scale | Medium | Quality hooks (lint, test gates) | Hooks catch syntax, not design. Architectural drift is invisible until it breaks. | | Merge conflicts between agents | Medium | Git worktrees (standard isolation) | Worktrees solve file conflicts but not semantic conflicts (two agents making incompatible design decisions) |
## Deep Dive 4: The Handoff / Memory Problem (between phases)
The problem in one sentence: Every phase transition in software development loses context, and no tool maintains a persistent "why layer" across the full project lifecycle.
Why this is the deepest problem: Every other problem compounds because of this one. Context loss between sessions, between phases, between team members, between agents — it's all the same root issue: there's no persistent, structured, queryable memory of what was decided, why, and what was rejected.
What exists today:
- → CLAUDE.md / .cursorrules — static project rules. Good for conventions, not for dynamic decisions.
- → Git history — records what changed, not why.
- → Commit messages — supposed to record why, but agents write generic ones.
- → Linear/Jira tickets — capture intent at the planning level, disconnected from implementation.
- → Chat logs — capture everything, findable for nothing.
What's needed:
- → Decision memory. "We chose Next.js because of X. We rejected Remix because of Y."
- → Phase memory. "Research found that competitors A, B, C exist. The gap is D."
- → Card/task memory. "This task was completed by agent Z. It produced these artifacts. The human edited these parts."
- → Project memory. "The overall product is trying to solve X for Y users. The key constraints are Z."
- → Promotable memory. Some facts are local to a task. Some matter for the whole project. The distinction should be explicit.
# Part 5: Where Atelier Fits — Research-Informed Positioning
Based on external research. To be refined after experience-driven research (using every tool intensively for 2-3 real projects).
## What the research tells us (before making choices)
### The landscape has a clear shape:
- →
The building phase is saturated. Cursor, Claude Code, Copilot, Windsurf, Antigravity, Cline, Devin — dozens of tools fight over "write code faster." There is no room for another code editor.
- →
The orchestration layer is emerging but shallow. Cline Kanban and T3 Code exist but only handle the building phase. They're kanban boards for agents. They don't touch research, design, testing, or memory.
- →
No tool spans the full development cycle. Every tool is phase-specific. The transitions between phases (research → design → build → test) are manual copy-paste or re-explanation to a new agent.
- →
Memory is the recognized frontier. Auto Dream, Windsurf Memories, agentmemory — everyone knows memory matters, nobody has built it well for the full project lifecycle. Current approaches are either implicit (the system decides) or static (CLAUDE.md).
- →
Design is the unoccupied high ground. Every tool in the landscape is engineering-first. None center design thinking, flow coherence, or taste as a product concern.
- →
The spec-driven movement confirms the demand for structure. Kiro, GitHub Spec Kit, Tessl Framework — the market wants structured workflows, not just chat. Atelier's card types are structured workflows.
- →
MCP creates the protocol layer. With 10,000+ active servers and universal IDE adoption, MCP makes it possible to build a platform that connects to everything without building everything.
## Positioning Hypothesis (to validate through experience)
### What Atelier IS:
A project studio for the full development lifecycle — from research to iteration — with persistent memory and task-type-specific surfaces.
Not an IDE. Not a code editor. Not an agent. A project-level orchestration and memory layer that sits above the tools people already use (Cursor, Claude Code, Codex) and adds:
- → Full-cycle coverage (research, design, build, test — not just build)
- → Workspace morphing (each task type gets the right environment)
- → Persistent project memory (decisions, reasoning, context that carries across sessions and phases)
- → Design as a first-class concern (not an afterthought)
### What Atelier IS NOT:
- → Not an IDE or code editor (Cursor/VS Code handles that)
- → Not an autonomous agent (Claude Code/Devin handles that)
- → Not a workflow automation tool (n8n/Zapier handles that)
- → Not a no-code builder (Lovable/v0 handles that)
- → Not a project management tool (Linear/Jira handles that)
### How Atelier relates to the competition:
| Tool | Relationship to Atelier | |------|------------------------| | Cursor / Claude Code / Copilot | Agents that Atelier orchestrates. Atelier dispatches work to them. | | Cline Kanban | Atelier subsumes its functionality (kanban orchestration) and extends it (full-cycle, memory, morphing) | | T3 Code | Similar multi-project management, but Atelier adds memory, morphing, and non-build phases | | Kiro | Spec-driven approach is directionally aligned. Atelier could adopt spec-driven patterns for its decomposition | | Linear | Project management layer. Atelier is the execution layer that bridges "ticket" and "code" | | Figma AI | Design tool. Atelier doesn't replace Figma but structures design thinking within the project workflow |
## Primary Target Segment: Solo Builder → Small Team
Beachhead: Solo builders running 2-5 parallel agent streams on real projects. The founder IS this user.
Why this segment first:
- → Shortest sales cycle (individual decision)
- → Most acute pain (context loss, fragmentation, memory)
- → Highest tolerance for early-stage products
- → Founder can dogfood authentically
- → The same tools that serve solo builders scale to 2-5 person teams
NOT targeting first:
- → Enterprise (too long a sales cycle, compliance requirements out of scope)
- → Non-technical founders (need too much hand-holding, different UX requirements)
- → Agencies (multi-client workflows add complexity beyond v1)
## Phase Ownership Strategy
### Phase 1 (v1): Research + Building + Memory
Own the research phase (the gap nobody else fills) and the building phase (the pain everyone feels), connected by persistent memory (the moat).
| Phase | Atelier role | Card type |
|-------|-------------|-----------|
| Research | OWN — structured research surface with sources, synthesis, and memory | research card |
| Planning | ASSIST — decomposition from intent to typed cards | Screen 1-2 (intent → board) |
| Building (backend) | OWN — API-specific surface with mock server, schema, request runner | backend-endpoint card |
| Building (frontend) | INTEGRATE — dispatch to Cursor/Claude Code, track progress on board | Board status tracking |
| Memory | OWN — persistent, curated, promotable memory across all phases | Memory rail, promotion modal |
### Phase 2 (v2): Design + Testing
| Phase | Atelier role | Card type |
|-------|-------------|-----------|
| Design | OWN — structured design artifact viewer for agent-authored design data | design-flow card |
| Testing | OWN — test run viewer, diff strip, quality gate enforcement | qa-regression card |
| Code review | OWN — inline diff with approve/send-back controls | code-review card |
### Phase 3 (v3): Capsules (Environment Isolation)
| Feature | What it does | |---------|-------------| | Capsules | Project = git worktree + dev server + browser profile + terminal + port. One-click switch. | | Port management | Automatic port allocation per project. Dynamic high-range ports per instance + canonical ports for active project. | | Service lifecycle | Per-service assign strategies: hot-reload (frontend), restart (backend), rebuild (baked images), none (databases). | | Cookie isolation | Each project gets its own browser profile — separate cookies, localStorage, ServiceWorker contexts. | | Secret injection | Explicit, encrypted, per-project credential management — not inherited from host shell. | | Observability | Single dashboard: all running projects, agents, port mappings, health, logs, volumes. | | Ambient presence | Thin desktop strip showing agent status across all projects. |
Note: Research shows runtime isolation is deeper than the original PRD assumed. Git worktrees solve code isolation but NOT port conflicts, shared databases, bind-mount collisions, secret leakage, or browser state contamination. The Coasts model (Coastfile + per-service assign strategies + dynamic port allocation) is the most complete architecture found. See deep-dive-runtime-isolation.md for the full analysis.
### Explicitly ignored (forever or until obvious demand):
- → Deployment/CI/CD (Vercel/Railway handles this well enough)
- → Workflow automation (n8n/Zapier's domain)
- → Full autonomous coding (Devin/Codex Cloud's domain)
- → Enterprise governance and compliance (v4+ if ever)
## The Three Moats (in order of defensibility)
- →
Persistent, Human-Curated Project Memory. The deepest moat. Nobody has built visible, editable, phase-persistent memory that flows from research → design → build → test. This is a design engineering problem (how to present and curate memory), not just a data problem.
- →
Full-Cycle Workspace Morphing. Each card type is a different environment. Research looks different from backend API design looks different from code review. This is the bet the PRD already makes — but the research confirms it's strategically sound because competitors are locked into chat interfaces.
- →
Design Taste as Product Identity. The product itself demonstrates the craft it helps users achieve. Animations, transitions, spatial reasoning, visual clarity — in a market of engineering-first tools, a design-first tool stands out. This is the moat that can't be copied by adding features.
## Open Questions (to resolve through experience)
- →
Is the morphing bet validated by real usage? The PRD asks: "Does morphing the workspace by card type feel genuinely better than a one-size-fits-all agent chat?" Only building and dogfooding can answer this.
- →
What's the right memory curation UX? Auto-suggest facts for promotion + human curate? Or fully manual? The ETH Zurich study says generic auto-generated context hurts. The answer is likely: auto-suggest, human curate, minimal by default.
- →
Should Atelier be web-first or desktop-first? Web is faster to ship and easier to distribute. Desktop enables Capsules (port/auth isolation). Research suggests: web-first studio, desktop Capsules later.
- →
What's the pricing? Free orchestration tools exist (Cline Kanban, T3 Code). Atelier needs to justify paid pricing through memory + morphing + full-cycle value. Freemium likely: free for 1 project, paid for memory + multi-project.
- →
How much of the build phase does Atelier own vs delegate? The
backend-endpointcard with a mock server is deep ownership. Frontend building might be pure delegation to Cursor/Claude Code with just status tracking on the board. Where's the line?
# Sources
## Research Reports
- → Anthropic 2026 Agentic Coding Trends Report
- → Figma 2025 AI Report
- → NN/g: AI Design Tools Status Update
## Key Articles
- → Addy Osmani: The Code Agent Orchestra
- → Addy Osmani: Conductors to Orchestrators
- → Simon Willison: Embracing the Parallel Coding Agent Lifestyle
- → Theo: The Agentic Code Problem
- → RedMonk: 10 Things Developers Want from Agentic IDEs
- → LinearB: Code Review is the New Bottleneck
- → Vibe Coding Works Until It Doesn't — AI PRs Carry 1.7x More Issues
## Tool-Specific Research
- → Cline Kanban
- → T3 Code
- → Cursor Problems in 2026
- → Devin 2025 Performance Review
- → Lovable vs Bolt vs v0 Comparison
- → Replit Review 2026
## Enterprise & Security
- → Enterprise AI Adoption Challenges 2026 — Writer
- → AI Agent Security Guide — MintMCP
- → MCP Enterprise Adoption 2026 — CData
- → AI Coding Statistics — Panto
## DevOps & Deployment
## Runtime Isolation & Orchestration
- → Git Worktrees Need Runtime Isolation — Penligent
- → Multi-Agent Coding Workspace — Augment
- → AO vs T3 Code vs Symphony vs Cmux — DEV
- → Google Scion for Parallel Agents
## Agentic UX & Design-to-Code
- → Designing for Agentic AI — Smashing Magazine
- → A2UI Protocol — DEV
- → Figma Files for MCP — LogRocket
- → Figma MCP Server Guide — Figma
- → Linear Agent — The Register