Coding agents are having a moment. But most demos show a single agent tackling a single task — a human picks the work, prompts the model, reviews the output. That works for individual developers augmenting their own productivity. It doesn't scale to an engineering team with a real backlog.
This post describes how to build something different: a fleet of autonomous coding agents that pick work from a project management system, execute it, and report back — with no human in the dispatch loop. It's an architecture we arrived at through iteration, and the decisions behind it are more interesting than the code.
The Central Question: Who Decides What Gets Done?
In most agentic demos, a human decides. They pick the task, write the prompt, kick off the run. The agent is a tool with a very long handle.
The question we wanted to answer was: what does it look like when the agent decides? Not chaotically — we still have a backlog, priorities, dependencies, and humans who need to stay in control. But the agent should be able to look at the project board in the morning and know what to work on, the same way a developer would.
This led us to a pull-based model: each agent pod independently queries the project management system, selects the highest-priority available work, claims it, executes, and reports back. No central dispatcher, no message queue. The PM tool is the queue, and the agents are workers polling it.
The simplicity is deliberate. You don't need Kafka or a custom scheduler. If your PM tool has a reasonable API, it can coordinate a fleet of agents.
What a Work Cycle Looks Like
From a user's perspective, the experience looks like having a developer join your team who:
- Checks the board every few minutes for new work
- Picks the highest-priority unblocked task
- Does the work, pushes a branch, posts a summary comment
- Marks the ticket done (or in testing, if the workflow requires it)
- Escalates to a human when stuck
From the system's perspective, each cycle runs a priority-ordered decision sequence:
- Is there work in progress assigned to me? Finish it.
- Do I own an Epic with available children? Continue working through it.
- Is there a stale Epic I should release? Free it for other agents.
- Are there bugs waiting? Pick the highest-priority one.
- Are there tasks waiting? Pick one, claiming its parent Epic if needed.
- Is there standalone work? Fall back to tasks not in any Epic.
- Nothing available? Sleep and try again.
The ordering encodes real engineering judgment. Finish what you started. Fix bugs before new features. Prefer deep work on a single Epic over context-switching. These priorities are baked into the algorithm, not left to chance.
Coordination Without a Lock Manager
Multiple agent pods run the same algorithm simultaneously. They'll sometimes target the same ticket. Getting this right required some thought.
The naive approach — check if a ticket is unclaimed, then claim it — has an obvious race condition. Two pods check simultaneously, both see the ticket as unclaimed, both try to claim it.
The solution is to use the PM tool's own concurrency mechanism. Most project management APIs include a version field on resources — a counter that increments on each update. A write request that doesn't include the current version number gets rejected with a conflict error. This is optimistic locking, and it's exactly what we need.
When an agent claims a work package, it writes the "In Progress" status, sets itself as assignee, and records its pod identity — all in one request that includes the current version. If another pod claimed the same ticket a millisecond earlier, the version is now stale and the request fails. The agent logs this, skips the ticket, and moves on.
No distributed lock. No coordination service. The PM tool handles it, and the cost of a conflict is just one wasted API call.
Routing Work to Specific Pods
Sometimes you want a specific agent to handle a specific ticket. A task that needs access to credentials only one pod has. A project with a large build cache that took hours to warm. A ticket that's been partially worked and should resume on the pod with the session context.
We modeled this as a field on the work package: set it to a pod name and only that pod will pick up the ticket. Leave it empty and any pod can claim it.
This lets operators route work through the same PM UI they use for everything else. No config change, no restart, no separate control plane. An operator can drag a ticket to a specific agent the same way they'd assign it to a developer.
The same field gets set automatically when a pod claims a ticket — so if execution fails and the ticket gets retried, it goes back to the same pod. After a failure, the field gets cleared, returning the ticket to the general pool so any pod can resume after a human resolves the issue.
Gating, Blocking, and Phase Control
A fleet of agents acting autonomously needs a way to be told "not yet." We have three mechanisms for this:
Explicit blocking relations. Work packages can depend on each other. An agent always checks whether a ticket is blocked by an incomplete predecessor before claiming it. This maps naturally to how humans sequence work in a PM tool — no separate dependency system needed.
Phase gating. Large projects are organized into phases. Work packages inside a phase are invisible to the agents until the phase itself is marked as active. A project manager can sequence entire tranches of work by moving a single phase marker — agents won't touch phase 2 work until phase 1 is declared in progress.
Spike escalation. When execution fails — the model couldn't implement the change, tests failed, a build broke — the agent creates a Spike as a child of the failing work package. A Spike is a short investigation item that signals "this needs human attention." Agents skip any work package with an unresolved Spike. A human looks at the spike, resolves the blocker, closes it, and the ticket re-enters the pool.
The Spike mechanism is our answer to the question: "what happens when the agent gets stuck?" The answer isn't a crash or a silent failure. It's a first-class escalation artifact in the same project board the team already uses.
Epic Ownership and Deep Work
One design tension in a multi-agent system is granularity. If you have 10 agents and 100 tickets spread across 10 epics, you could have each agent pick up the next available ticket regardless of which epic it belongs to. The tickets would get done, but each codebase would see contributions from multiple agents interleaving, creating merge conflicts and context-switching overhead.
Our solution is Epic ownership: an agent that picks up a ticket from an Epic claims the Epic itself. Other agents see the Epic as owned and look elsewhere. The claiming agent works through the Epic's children one by one, building up context and a warm repository cache, until the Epic is exhausted — then releases it.
This mirrors how developers work. You don't ping-pong a single feature between five people. One developer owns a feature track until it's done. We baked this into the selection algorithm rather than leaving it as a convention.
The Model as a Subprocess
One architectural decision that might seem odd: the agent doesn't call an LLM API directly. It invokes Claude Code as a subprocess, passing it a prompt and collecting structured JSON output.
This was a deliberate choice. Claude Code already handles multi-turn reasoning, tool execution, file editing, and bash commands. Building that stack ourselves would be a significant project, and we'd be rebuilding something that already exists and is well-tested.
The boundary is clean: Claude Code handles "how do I implement this?" The orchestrator handles "what should be implemented, where does the code go, and what happens when it's done?" Each layer does what it's good at.
The subprocess model also gives us control over the execution environment. We restrict the tools available to the model — file operations, search, shell commands — without network access or the ability to spawn child agents. The blast radius of a misbehaving model is bounded. Each execution runs inside a Nix development shell, which is how the agent knows it has the right build tools regardless of what language or stack the project uses — more on that below.
Nix as the Agent's Contract with a Repository
Here's a problem that's easy to overlook when building coding agents: the agent doesn't know what kind of project it's walking into.
One project is a Go service. Another is a Python ML pipeline. A third is a TypeScript monorepo with a custom build toolchain. Each has different compilers, linters, test runners, environment variables, and setup steps. A human developer reads the README, sets up their local environment once, and forgets about it. An agent has to figure this out from scratch, every time, on every pod.
The conventional answer is containers: build a project-specific Docker image with all the dependencies baked in. But that requires someone to maintain the image, rebuild it when dependencies change, and push it to a registry the agent pod can pull. It couples the agent infrastructure to the project's build toolchain in an operational way.
Our answer is Nix.
Every project that the agent works on has a flake.nix at its root — a declarative description of the development environment. It lists the exact packages, versions, and environment variables needed to build, test, and run the project. When the agent checks out a repository, it runs nix develop before executing, and Nix handles everything: downloading the right compiler, setting up PATH, making tools available. The agent doesn't need to know the project is using Go 1.22 or Python 3.11 or a specific version of Node. The flake knows.
If a repository doesn't have a flake.nix yet, the orchestrator creates a minimal starter one and commits it before the first execution. The project starts with an empty shell — no packages declared — and the model can add to it as it discovers what it needs. Adding a dependency is just editing flake.nix and running the shell again.
The entire Claude Code subprocess runs inside the Nix shell:
nix develop --command claude -p "<prompt>" --output-format json
This means every tool invocation the model makes — go test, cargo build, pytest, whatever — runs in the pinned environment. There's no "works on the agent pod but not in CI" because the agent pod and CI are both running from the same flake.
Why this matters for agentic work specifically. When a human developer changes dependencies, they update a lock file, run a package manager, and the change stays on their local machine until they commit it. An agent doesn't have that kind of persistent local state. Each task could run on a fresh pod, or a pod that last worked on a completely different project. The environment has to be reproducible from the repository contents alone.
Nix gives you that. The flake is checked into the repository. Any pod, any time, gets an identical environment by checking out the code and running nix develop. There's no setup step, no bootstrapping, no "did you remember to install X."
There's also a subtler benefit: the model can reason about its own environment. When the model needs a new tool — say, a JSON schema validator it's never used before — it can add it to flake.nix, enter the shell, and use it, all within the same execution. It doesn't need to apt-get install and hope the package exists, doesn't need to pip install into a shared environment that might conflict with other projects. The environment is scoped to the project and versioned alongside the code.
We cache the Nix store between executions on the same pod. Warming a fresh environment for a cold project can take a few minutes; subsequent executions on the same pod hit the cache and start in seconds. This is another reason Epic ownership matters — an agent that owns an Epic and works through it linearly warms the cache once and amortizes that cost across many tasks.
The bet here is that reproducible, declarative environments are going to become table stakes for agentic codebases. A repository that an agent can work on safely needs to be able to describe, from its own contents, exactly what environment it needs to build and test correctly. Nix is currently the best tool for that. Devcontainers and Dev Environments are moving in a similar direction. But the underlying principle — the environment is code, checked in, versioned, reproducible — is what enables agents to work across a heterogeneous project portfolio without infrastructure per project.
Prompt Construction: Context Hierarchy
The prompt passed to the model isn't just the ticket description. It includes the full context hierarchy: the Epic the work belongs to (for architectural intent), the parent User Story if there is one (for behavioral context), and then the specific work package.
We structure this as tagged blocks rather than prose. This keeps injected data clearly separate from instructions, which matters when ticket descriptions might contain arbitrary text. It also gives the model a clear mental model of the scope: "here's the big picture, here's the feature, here's the specific task."
One explicit instruction in every prompt: do not run git commands. The orchestrator owns the repository. The model writes code; the orchestrator commits, branches, and pushes. This keeps git history clean and attributable, and means git credentials never appear in the model's context.
What the Audit Trail Looks Like
Every significant agent action leaves a comment on the work package in the PM tool:
- "Execution complete — session
abc123, 2400 input tokens / 800 output tokens" - "Pushed to
release/v2.1— commita3f9c12" - "Execution failed: test suite failed with exit code 1"
- "Created spike #147 for human review"
The PM tool becomes the audit log. You can open any ticket and see exactly what the agent did, when, and what it produced. Token counts give a rough proxy for complexity. Session IDs let you trace a resumed execution back to its origin.
This was a deliberate choice over structured logs or a separate audit database. Humans are already looking at the PM tool. Putting the agent's activity there means it's visible without any additional tooling.
What This Architecture Gets You
Scaling is just adding pods. Each pod is stateless (modulo its in-progress Epic and repository cache). Adding more agents means more work gets done in parallel, constrained naturally by Epic ownership and the available pool of ready tickets.
Human control is fine-grained but unintrusive. Operators direct agent behavior through the PM tool — assigning tickets to specific pods, gating phases, adding blocking relations, resolving spikes. None of this requires touching agent configuration. The PM tool is the control surface.
Workflow compatibility. Because agents participate in the same status workflows as human developers — tickets move through New → In Progress → In Testing → Closed — the PM tool's reporting, burndown charts, and workflows all work without modification. The agents look like team members to the tool.
Failure handling is first-class. Spikes surface stuck work visibly. Optimistic locking handles concurrency gracefully. The orchestrator separates git operations from model execution so a failure in one doesn't corrupt the other.
What It Doesn't Solve (Yet)
A few things are deliberately left to humans:
Code review. Agents push to a feature branch. A human reviews and merges. This is intentional — the agent is a first-pass implementer, not a final authority.
Work decomposition. Agents execute tasks from the backlog. They don't create tasks, break down Epics, or reason about what needs to exist. That's a planning problem distinct from an execution problem.
Repository conflicts. If two agents push changes to the same file on the same branch, the second will hit a conflict. The current approach serializes per-Epic, but cross-Epic conflicts on shared files are possible. Better conflict resolution is on the roadmap.
Closing Thoughts
The most important architectural decision in this system isn't a technical one. It's treating the project management tool as the authoritative coordination layer — not as a record-keeping afterthought.
When work selection, priority, routing, gating, and escalation all flow through the PM tool, you get something valuable: the behavior of the agent fleet is visible and controllable to anyone who can use a project board. You don't need to understand the orchestrator's internals to know what the agents are doing or to direct their behavior.
Agents that can be directed through familiar tools, that surface their failures as first-class work items, and that participate in existing workflows rather than creating parallel ones — those are agents a team can actually work alongside.


