protocol

KLIS-3: Lease & Ownership Protocol

How temporary intents prevent Ghost Leases and guarantee liveness.

1. Purpose

The Lease Protocol defines the temporal aspects of intent. An intent is not a permanent property; it is a temporary Lease granted by the Control Plane. This specification ensures system liveness by preventing "Ghost Leases" (locks held by crashed agents).

2. Core Concepts

2.1. The Lease

A Lease is a reified object representing an active, granted intent.

type Lease = {
  id: "lease_uuid",
  resource: "FILE:/path",
  predicate: "MUTATES",
  agent_id: "agent_1",
  acquired_at: Timestamp,
  expires_at: Timestamp,
  ttl: Milliseconds
}

2.2. The Ghost Lease Problem

If an agent acquires a lock and then crashes (OOM, network failure, infinite loop), the lock persists, blocking all other agents indefinitely. KLIS solves this via mandatory TTLs.

3. Lifecycle Protocol

3.1. Acquisition

  • Agents request leases via the Intent Manifest (KLIS-1).
  • The Control Plane grants leases only if NO conflicts exist (KLIS-2).
  • Default TTL: Implementations SHOULD default to 60 seconds.
  • Max TTL: Implementations MUST cap the maximum TTL (e.g., 5 minutes) to prevent DoS.
  • Resurrection Window: If an agent process restarts, the Kernel SHOULD grant a 15-second "Grace Period" where the lease is held even if a heartbeat is missed, provided the agent is in a DIE_RESTART state.
  • Immutability: The agent's timestamp (priority) MUST NOT change during the acquisition attempt. If a higher priority is needed (e.g. to preempt a deadlock), the agent MUST abort the current request and restart with a new, fresher timestamp.

3.2. Heartbeat (Renewal)

  • Agents MUST send a Heartbeat signal for all active leases before expires_at.
  • Frequency: Recommended at TTL / 3.
  • Effect: Control Plane extends expires_at by TTL from the current time.
  • Validation: Control Plane MUST reject heartbeats for leases that are already expired or released.

3.3. Release

  • Explicit: Agent sends RELEASE(lease_id). This is the "Happy Path".
  • Implicit (Auto-Expiry): If now > expires_at, the Control Plane automatically revokes the lease.

3.4. Reconciliation on Boot

Upon resurrection, an agent MUST perform a VERIFY_ALL call. The Kernel MUST return the current global state of those specific leases. This is the "Reality Check" that overrides the agent's local StateDigest.

4. State Transitions

    REQUESTED -->|Grant| ACTIVE
    ACTIVE -->|Heartbeat| ACTIVE
    ACTIVE -->|Release| RELEASED
    ACTIVE -->|Timeout| EXPIRED
    RELEASED --> TERMINAL
    EXPIRED --> TERMINAL

5. Normative Rules

  1. Strict Expiry: Implementations MUST NOT honor a lease past its expiry time, even by milliseconds.
  2. Clock Synchronization: The Control Plane's clock is the Source of Truth. Agents MUST account for network latency.
  3. Idempotency: RELEASE operations MUST be idempotent. Releasing an already released/expired lease is a no-op (HTTP 200 OK).

6. Failure Modes

6.1. Network Partition

  • If an agent cannot reach the Control Plane to heartbeat, its leases WILL expire.
  • Correctness: The agent MUST assume it has lost the lease. It MUST pause execution and attempt to re-acquire (see KLIS-5). Continuing simply because "I grabbed it earlier" is a protocol violation.

6.2. Clock Skew

  • If agent clock << server clock, agent may think it holds a lease that is already expired.
  • Mitigation: Agents SHOULD use a conservative local expiry (e.g., expiry - skew_buffer).

7. Security Considerations

  • Lease Squatting: Malicious agents might loop Acquire -> Heartbeat forever, blocking resources.
  • Mitigation: Control Planes MAY enforce "Total Lease Time" quotas or "Idle Detection" (revoke if file not actually modified).

8. Non-Goals

  • Distributed Consensus: A single Control Plane instance (or consensus cluster like Raft) is assumed to be the authority. KLIS does not define the consensus algorithm itself.