Managing AI Coding Agents at Scale: Inside Fleet's Parallel Execution Framework

Managing AI Coding Agents at Scale: Inside Fleet's Parallel Execution Framework

May 24, 2026 ai coding agents claude code parallel execution python task queues ai development tools token optimization beads agent orchestration

Managing AI Coding Agents at Scale: Inside Fleet's Parallel Execution Framework

The AI coding revolution isn't just about having powerful models—it's about orchestration. When a developer discovered that running 50+ Claude Code sessions in parallel could dramatically accelerate project work, it sparked a fascinating question: how do you actually manage that many agents without chaos?

Enter Fleet, a Python supervisor that transforms the dream of parallel AI development into practical reality.

The Problem With Running Multiple Agents

If you've ever tried to run several AI coding assistants simultaneously, you've probably hit the same wall: coordination. Which agent works on which task? How do they avoid stepping on each other's toes? How do you track context usage when you're burning through tokens across multiple sessions?

The naive approach—spinning up separate Claude sessions in different directories and hoping for the best—falls apart quickly. You need a centralized system that understands dependencies, task priorities, and the state of your codebase across all sessions.

Fleet's Core Architecture: Simplicity Through Centralization

Fleet solves this through three elegant design decisions:

1. One Queue to Rule Them All

Rather than scattering task management across projects, Fleet uses a centralized beads database (a Git-backed issue tracker) living in ~/.fleet. When you run fleet bd create, it records both the task and the working directory where it was created. Each agent spawns in the exact context where it's needed.

This single point of truth means multiple coding agents can claim tasks, work independently, and hand off results without creating merge conflicts or duplicate work. It's the difference between organized parallelism and chaotic distributed work.

2. Pluggable Agent Support

Fleet doesn't lock you into one model or provider. It currently supports:

  • Claude (extensively tested)
  • Agy/Antigravity (briefly tested)
  • Codex (implemented but untested)

Adding a new coder takes minutes. This flexibility matters because different models excel at different tasks—you might want Claude's reasoning for architecture, but Anthropic's newer models for boilerplate generation. Fleet lets you specify the coder and model per task.

3. Smart Concurrency Management

The default max_concurrent is 3, but you can scale to 10+ sessions simultaneously:

fleet config set max_concurrent=10
fleet bd create --title "Refactor authentication module" --description "..."
fleet bd create --coder agy --model opus --title "Build API tests" --description "..."

Fleet will queue everything and maintain your concurrency limits automatically. The bottleneck isn't the framework—it's your subscription token limits.

Operational Commands That Matter

Fleet provides a practical CLI that keeps you informed without overwhelming you:

  • fleet tasks — See what's in progress, which coder is working on it, and real-time token consumption
  • fleet task <id> log — Review what the agent actually did
  • fleet task <id> plan — Understand the agent's approach before execution
  • fleet task <id> knowledge — Check what context was available
  • fleet config show|set — Adjust settings on the fly

No dashboards, no unnecessary complexity. Just the info you need.

The Token Economics Reality Check

Here's the honest part: tokens are the constraint, not agent capacity.

The developers using Fleet extensively have discovered that simply throwing more agents at the problem hits subscription limits quickly. Their solution? Token rotation across multiple Claude accounts, plus aggressive context hygiene.

This last point is underrated: they audited their CLAUDE.md files, plugin directories, and skill repositories to find that some were loaded twice, effectively doubling token cost for no benefit. Cleaning this up had more impact than adding agents.

Why This Matters for Your Architecture

Fleet represents a shift in how we think about AI-assisted development. Rather than treating coding agents as solitary tools, it positions them as coordinated members of a team:

  • Spec-driven development becomes practical — You write detailed specs, Fleet distributes sub-tasks across agents, and they coordinate through a shared queue.
  • Context becomes a manageable resource — Centralized task tracking means each agent focuses only on relevant context.
  • Experimentation scales — Want to test whether Claude or Agy is better at infrastructure code? Run them in parallel on the same task spec and compare.

The Implementation Reality

What makes Fleet elegant is that it's not complicated. The original implementation was literally a bash loop monitoring a beads queue. The Python version adds just enough structure to handle real workflows—task dependencies, priorities, concurrent execution limits, and agent lifecycle management.

This suggests something important: the infrastructure for parallel AI coding isn't bottlenecked by technical complexity. It's bottlenecked by token budgets and finding the right task decomposition for agents.

Looking Forward

As AI coding agents become more capable, the question shifts from "Can I run multiple agents?" to "Can I run them efficiently?" Fleet answers both.

The developers using it extensively are discovering that the real gains come from combining three things:

  1. Parallel execution (Fleet provides this)
  2. Intelligent task decomposition (you provide this)
  3. Token optimization (you manage this)

If you're at the point where one coding agent isn't enough, but you're not sure how to orchestrate multiple agents, Fleet is worth exploring. It's the kind of tool that feels obvious in hindsight but required someone to build first.

Read in other languages:

RU BG EL CS UZ TR SV FI RO PT PL NB NL HU IT FR ES DE DA ZH-HANS