concepts

Wait-Die Scheduling

How Klock guarantees deadlock-free conflict resolution.

Klock solves the Multi-Agent Race Condition using a concurrency control algorithm borrowed from distributed database architecture known as Wait-Die Scheduling.

The Problem with Deadlocks

In a swarm, agents often need multiple resources.

  • Agent 1 locks A and requests B.
  • Agent 2 locks B and requests A.

This is a deadlock. In a naive system, both agents will loop forever, burning expensive LLM tokens while waiting for the other to yield.

The Wait-Die Protocol

Klock assigns a unique timestamp (priority) to every agent session when it starts. Older agents have a higher priority than younger agents.

When a conflict occurs, Klock uses the following logic:

Scenario 1: Older agent wants a resource held by a Younger agent

The older agent is allowed to Wait. It enters a queue.

  • Why? Because older agents have invested more time/tokens into their task. It makes sense for them to queue up politely.

Scenario 2: Younger agent wants a resource held by an Older agent

The younger agent is ordered to Die (Abort/Retry). Klock rejects the request.

  • Why? If we let the younger agent wait, it could form a cycle (a deadlock) with the older agent. By forcing it to die, we mathematically break the cycle. The younger agent can catch the "DIE" signal, refresh its context, and try again later.
┌─────────────────┐       ┌─────────────────┐
│ Agent A (Older) │       │ Agent B (Young) │
│ Timestamp: 100  │       │ Timestamp: 200  │
└────────┬────────┘       └────────┬────────┘
         │                         │
         │  Asks for Resource X    │
         ├────────────────────────►│ Has Resource X
         │                         │
         ▼                         ▼
   Wait (Queue)                DIE (Abort)

The Mathematical Guarantee

Because waiting is strictly unidirectional (only older agents wait for younger ones), a circular wait is mathematically impossible.

By using Wait-Die, Klock guarantees safety (no data corruption) and liveness (no deadlocks), even when 100 agents are operating concurrently on the same codebase.