protocol

KLIS-7: Control Plane Requirements

Minimum guarantees for a KLIS authoritative server.

1. Purpose

The Control Plane is the authoritative server (or cluster) that maintains the Global Intent State. KLIS-7 defines the minimum API and state guarantees required to host a KLIS-compliant ecosystem.

2. Responsibilities

  1. State Persistence: Store active Leases and Agent Sessions.
  2. Conflict Evaluation: Execute the logic from KLIS-2 (in O(1)).
  3. Timekeeping: Be the source of truth for Lease Expiry (KLIS-3).
  4. Notification: Publish events (LeaseFreed, ContentionDetected).

3. Required API Concepts

The Control Plane MUST expose endpoints equivalent to:

  • POST /v1/manifest: Submit a NIM. Returns Granted | Denied | Wait.
  • POST /v1/leases/heartbeat: Bulk renew active leases.
  • POST /v1/leases/release: Explicitly release resources.
  • POST /v1/leases/reconcile: Resurrection API. Accepts an array of Lease IDs and an Agent ID. Returns a boolean map of validity.
  • GET /v1/resource/{id}/state: Get current locks (who holds this?).
  • GET /v1/contention: Get hotspot metrics.

4. State Guarantees

  • Linearizability: Lease operations MUST be atomic. Multiple concurrent requests for the same resource MUST be serialized.
  • Durability: Leases SHOULD be stored in memory for speed (Redis/Memcached) but critical session state SHOULD be persisted.
    • Correction: Since Leases are ephemeral (TTL), in-memory (with WAL) is usually sufficient.

5. Multi-Tenant Considerations

  • Namespaces: The Control Plane MUST support isolation between unrelated tenants. AppA's /config is not AppB's /config.
  • Quotas: Rate-limiting on Acquire requests to prevent DDOS.

6. Observability

The Control Plane MUST provide a "God View" of the system:

  • Who is blocked?
  • What are the hotspots?
  • Who are the "Ghost Agents" (high timeout rate)?

7. Non-Goals

  • Data Storage: The Control Plane stores metadata (Intents), not the actual file contents.
  • Execution: The Control Plane does not run the agents.

8. Telemetry of Death

The Control Plane MUST track "Death Counts" (how many times an agent has been killed via Wait-Die). If an agent exceeds a "Starvation Threshold" (e.g., 50 deaths), the Control Plane MUST issue a PAUSED verdict for manual intervention.