protocol
KLIS-3: Lease & Ownership Protocol
How temporary intents prevent Ghost Leases and guarantee liveness.
1. Purpose
The Lease Protocol defines the temporal aspects of intent. An intent is not a permanent property; it is a temporary Lease granted by the Control Plane. This specification ensures system liveness by preventing "Ghost Leases" (locks held by crashed agents).
2. Core Concepts
2.1. The Lease
A Lease is a reified object representing an active, granted intent.
type Lease = {
id: "lease_uuid",
resource: "FILE:/path",
predicate: "MUTATES",
agent_id: "agent_1",
acquired_at: Timestamp,
expires_at: Timestamp,
ttl: Milliseconds
}
2.2. The Ghost Lease Problem
If an agent acquires a lock and then crashes (OOM, network failure, infinite loop), the lock persists, blocking all other agents indefinitely. KLIS solves this via mandatory TTLs.
3. Lifecycle Protocol
3.1. Acquisition
- Agents request leases via the Intent Manifest (KLIS-1).
- The Control Plane grants leases only if NO conflicts exist (KLIS-2).
- Default TTL: Implementations SHOULD default to 60 seconds.
- Max TTL: Implementations MUST cap the maximum TTL (e.g., 5 minutes) to prevent DoS.
- Resurrection Window: If an agent process restarts, the Kernel SHOULD grant a 15-second "Grace Period" where the lease is held even if a heartbeat is missed, provided the agent is in a
DIE_RESTARTstate. - Immutability: The agent's timestamp (priority) MUST NOT change during the acquisition attempt. If a higher priority is needed (e.g. to preempt a deadlock), the agent MUST abort the current request and restart with a new, fresher timestamp.
3.2. Heartbeat (Renewal)
- Agents MUST send a
Heartbeatsignal for all active leases beforeexpires_at. - Frequency: Recommended at
TTL / 3. - Effect: Control Plane extends
expires_atbyTTLfrom the current time. - Validation: Control Plane MUST reject heartbeats for leases that are already expired or released.
3.3. Release
- Explicit: Agent sends
RELEASE(lease_id). This is the "Happy Path". - Implicit (Auto-Expiry): If
now > expires_at, the Control Plane automatically revokes the lease.
3.4. Reconciliation on Boot
Upon resurrection, an agent MUST perform a VERIFY_ALL call. The Kernel MUST return the current global state of those specific leases. This is the "Reality Check" that overrides the agent's local StateDigest.
4. State Transitions
REQUESTED -->|Grant| ACTIVE
ACTIVE -->|Heartbeat| ACTIVE
ACTIVE -->|Release| RELEASED
ACTIVE -->|Timeout| EXPIRED
RELEASED --> TERMINAL
EXPIRED --> TERMINAL
5. Normative Rules
- Strict Expiry: Implementations MUST NOT honor a lease past its expiry time, even by milliseconds.
- Clock Synchronization: The Control Plane's clock is the Source of Truth. Agents MUST account for network latency.
- Idempotency:
RELEASEoperations MUST be idempotent. Releasing an already released/expired lease is a no-op (HTTP 200 OK).
6. Failure Modes
6.1. Network Partition
- If an agent cannot reach the Control Plane to heartbeat, its leases WILL expire.
- Correctness: The agent MUST assume it has lost the lease. It MUST pause execution and attempt to re-acquire (see KLIS-5). Continuing simply because "I grabbed it earlier" is a protocol violation.
6.2. Clock Skew
- If agent clock
<<server clock, agent may think it holds a lease that is already expired. - Mitigation: Agents SHOULD use a conservative local expiry (e.g.,
expiry - skew_buffer).
7. Security Considerations
- Lease Squatting: Malicious agents might loop
Acquire -> Heartbeatforever, blocking resources. - Mitigation: Control Planes MAY enforce "Total Lease Time" quotas or "Idle Detection" (revoke if file not actually modified).
8. Non-Goals
- Distributed Consensus: A single Control Plane instance (or consensus cluster like Raft) is assumed to be the authority. KLIS does not define the consensus algorithm itself.