Managing AI Coding Agents at Scale: Inside Fleet's Parallel Execution Framework
Managing AI Coding Agents at Scale: Inside Fleet's Parallel Execution Framework
The AI coding revolution isn't just about having powerful models—it's about orchestration. When a developer discovered that running 50+ Claude Code sessions in parallel could dramatically accelerate project work, it sparked a fascinating question: how do you actually manage that many agents without chaos?
Enter Fleet, a Python supervisor that transforms the dream of parallel AI development into practical reality.
The Problem With Running Multiple Agents
If you've ever tried to run several AI coding assistants simultaneously, you've probably hit the same wall: coordination. Which agent works on which task? How do they avoid stepping on each other's toes? How do you track context usage when you're burning through tokens across multiple sessions?
The naive approach—spinning up separate Claude sessions in different directories and hoping for the best—falls apart quickly. You need a centralized system that understands dependencies, task priorities, and the state of your codebase across all sessions.
Fleet's Core Architecture: Simplicity Through Centralization
Fleet solves this through three elegant design decisions:
1. One Queue to Rule Them All
Rather than scattering task management across projects, Fleet uses a centralized beads database (a Git-backed issue tracker) living in ~/.fleet. When you run fleet bd create, it records both the task and the working directory where it was created. Each agent spawns in the exact context where it's needed.
This single point of truth means multiple coding agents can claim tasks, work independently, and hand off results without creating merge conflicts or duplicate work. It's the difference between organized parallelism and chaotic distributed work.
2. Pluggable Agent Support
Fleet doesn't lock you into one model or provider. It currently supports:
- Claude (extensively tested)
- Agy/Antigravity (briefly tested)
- Codex (implemented but untested)
Adding a new coder takes minutes. This flexibility matters because different models excel at different tasks—you might want Claude's reasoning for architecture, but Anthropic's newer models for boilerplate generation. Fleet lets you specify the coder and model per task.
3. Smart Concurrency Management
The default max_concurrent is 3, but you can scale to 10+ sessions simultaneously:
fleet config set max_concurrent=10
fleet bd create --title "Refactor authentication module" --description "..."
fleet bd create --coder agy --model opus --title "Build API tests" --description "..."
Fleet will queue everything and maintain your concurrency limits automatically. The bottleneck isn't the framework—it's your subscription token limits.
Operational Commands That Matter
Fleet provides a practical CLI that keeps you informed without overwhelming you:
fleet tasks— See what's in progress, which coder is working on it, and real-time token consumptionfleet task <id> log— Review what the agent actually didfleet task <id> plan— Understand the agent's approach before executionfleet task <id> knowledge— Check what context was availablefleet config show|set— Adjust settings on the fly
No dashboards, no unnecessary complexity. Just the info you need.
The Token Economics Reality Check
Here's the honest part: tokens are the constraint, not agent capacity.
The developers using Fleet extensively have discovered that simply throwing more agents at the problem hits subscription limits quickly. Their solution? Token rotation across multiple Claude accounts, plus aggressive context hygiene.
This last point is underrated: they audited their CLAUDE.md files, plugin directories, and skill repositories to find that some were loaded twice, effectively doubling token cost for no benefit. Cleaning this up had more impact than adding agents.
Why This Matters for Your Architecture
Fleet represents a shift in how we think about AI-assisted development. Rather than treating coding agents as solitary tools, it positions them as coordinated members of a team:
- Spec-driven development becomes practical — You write detailed specs, Fleet distributes sub-tasks across agents, and they coordinate through a shared queue.
- Context becomes a manageable resource — Centralized task tracking means each agent focuses only on relevant context.
- Experimentation scales — Want to test whether Claude or Agy is better at infrastructure code? Run them in parallel on the same task spec and compare.
The Implementation Reality
What makes Fleet elegant is that it's not complicated. The original implementation was literally a bash loop monitoring a beads queue. The Python version adds just enough structure to handle real workflows—task dependencies, priorities, concurrent execution limits, and agent lifecycle management.
This suggests something important: the infrastructure for parallel AI coding isn't bottlenecked by technical complexity. It's bottlenecked by token budgets and finding the right task decomposition for agents.
Looking Forward
As AI coding agents become more capable, the question shifts from "Can I run multiple agents?" to "Can I run them efficiently?" Fleet answers both.
The developers using it extensively are discovering that the real gains come from combining three things:
- Parallel execution (Fleet provides this)
- Intelligent task decomposition (you provide this)
- Token optimization (you manage this)
If you're at the point where one coding agent isn't enough, but you're not sure how to orchestrate multiple agents, Fleet is worth exploring. It's the kind of tool that feels obvious in hindsight but required someone to build first.