concepts
Wait-Die Scheduling
How Klock guarantees deadlock-free conflict resolution.
Klock solves the Multi-Agent Race Condition using a concurrency control algorithm borrowed from distributed database architecture known as Wait-Die Scheduling.
The Problem with Deadlocks
In a swarm, agents often need multiple resources.
- Agent 1 locks
Aand requestsB. - Agent 2 locks
Band requestsA.
This is a deadlock. In a naive system, both agents will loop forever, burning expensive LLM tokens while waiting for the other to yield.
The Wait-Die Protocol
Klock assigns a unique timestamp (priority) to every agent session when it starts. Older agents have a higher priority than younger agents.
When a conflict occurs, Klock uses the following logic:
Scenario 1: Older agent wants a resource held by a Younger agent
The older agent is allowed to Wait. It enters a queue.
- Why? Because older agents have invested more time/tokens into their task. It makes sense for them to queue up politely.
Scenario 2: Younger agent wants a resource held by an Older agent
The younger agent is ordered to Die (Abort/Retry). Klock rejects the request.
- Why? If we let the younger agent wait, it could form a cycle (a deadlock) with the older agent. By forcing it to die, we mathematically break the cycle. The younger agent can catch the "DIE" signal, refresh its context, and try again later.
┌─────────────────┐ ┌─────────────────┐
│ Agent A (Older) │ │ Agent B (Young) │
│ Timestamp: 100 │ │ Timestamp: 200 │
└────────┬────────┘ └────────┬────────┘
│ │
│ Asks for Resource X │
├────────────────────────►│ Has Resource X
│ │
▼ ▼
Wait (Queue) DIE (Abort)
The Mathematical Guarantee
Because waiting is strictly unidirectional (only older agents wait for younger ones), a circular wait is mathematically impossible.
By using Wait-Die, Klock guarantees safety (no data corruption) and liveness (no deadlocks), even when 100 agents are operating concurrently on the same codebase.