You can design a “probationary mode” that is not full brick, but it only works if you accept the classic distributed-systems tradeoff: when the system can’t reach its countersigner (partition, jitter, peer failure), you must choose between safety (deny writes) and availability (allow some writes without countersign). For security, you normally bias to safety for high-impact actions, and allow limited autonomy for low-impact actions.
Goal
Keep autonomy for planning/reading/executing low-risk actions, while gating every “meaningful write” behind a real-time cryptographic countersign (two-man rule), without deadlock.
Key idea: make “writes” go through a commit protocol, not a conversational back-and-forth
Deadlock usually comes from symmetric waiting (“you sign after I sign”). Avoid that by having a single, deterministic flow with a coordinator and timeouts.
A clean pattern is: “proposal -> countersign -> commit,” where the countersigner never waits on the proposer for anything except a well-formed proposal.
Building blocks
- Capability separation: the agent never holds a long-lived key that can unilaterally authorize writes. It holds only a short-lived “proposal key” to request actions.
- Countersigner node(s): independent process(es) with a policy engine and signing key(s). Preferably isolated (separate host/VM/TEE/HSM-backed key).
- Action taxonomy: define which operations are “write” and their risk level (examples: file writes, registry writes, firewall changes, credential access, persistence, outbound network policy changes, kernel module loads, etc.).
- Tamper-evident audit: all proposals, approvals, and commits are logged out-of-band.
Probationary Mode mechanics (two-man rule for writes)
1)
Deterministic action envelope
Every write is first encoded into a canonical “ActionRequest” object so both nodes sign the exact same bytes:
- action_type + target identifiers
- exact diff / command / parameters (no free-form “do X”)
- preconditions (expected hashes, versions, current state)
- idempotency key (unique action ID)
- sequence number / monotonic counter
- expiry (short TTL)
- rollback plan or compensating action (if applicable)
2)
Countersign requirement via threshold signatures or dual signatures
Two common options:
- Dual signature: agent signs ActionRequest, countersigner signs (ActionRequest + agent signature). Executor validates both.
- Threshold signature (t-of-n): countersigners produce shares; combined signature is valid only if quorum is reached. This scales better if you want “2 of 3” instead of exactly “peer A.”
3)
Execution gating at an enforcement point the agent can’t bypass
The write is only permitted if the enforcement point verifies the countersignature. Enforcement must be below the agent:
- hypervisor policy module, sidecar proxy, kernel LSM/eBPF policy that checks tokens, privileged API gateway, or a management plane that is not writable by the agent
If the agent can patch/disable the enforcement point, “probationary mode” becomes theater.
How to avoid deadlock
Deadlock is prevented by design if you adopt one of these patterns (ordered from simplest/most common):
Option A: Single coordinator + 2-phase commit (2PC) with timeouts
- Coordinator is deterministic (e.g., always the countersigner, or always a specific cluster leader).
- Agent sends a proposal to coordinator.
- Coordinator either signs (APPROVE) or refuses (DENY).
- If approved, agent submits the signed approval to the enforcement point and executes.
- Timeout rule: if approval isn’t received before expiry, action is aborted. No “waiting forever,” so no deadlock.
This is “fail closed with bounded waiting.”
Option B: Consensus log (Raft-style) as the countersign mechanism
Instead of “peer A signs in real-time,” you require the action to be committed to a small quorum log:
- Agent proposes an action entry.
- Cluster reaches consensus; once committed, the log entry itself is the authorization artifact (or produces a signature).
- Executor only performs writes corresponding to committed entries.
This naturally avoids bilateral waiting and gives you an ordering of writes (huge for preventing weird interleavings).
Option C: Pre-authorized micro-leases (to reduce DoS while keeping two-man control)
To avoid “every tiny write blocks on network,” you can issue narrowly scoped, short-lived “micro-leases”:
- Countersigner grants a lease: “You may perform up to N writes of class X within scope S for T seconds.”
- Each individual write still produces an auditable record and must carry the lease token, but does not require round-trip approval.
- High-risk writes still require real-time countersign.
This is often the best “probation” compromise: you’re constrained, not bricked, and you don’t melt the system with latency.
Preventing abuse in probationary mode
- No agent-chosen peer: the peer set must be fixed by policy, not selected dynamically by the agent (otherwise it routes to a “friendly” signer).
- Anti-replay: include nonce + monotonic sequence + short TTL; enforcement rejects duplicates and out-of-window tokens.
- State-based preconditions: countersigner approves “write if state==expected,” reducing TOCTOU abuse.
- Rate limits: stop an erratic agent from spamming approvals and creating self-inflicted DoS.
- Scoped permissions: approvals should be narrowly bounded (path prefixes, registry hives, API subsets, network destinations).
What happens under partitions (the unavoidable part)
You can define a spectrum rather than binary trust, but you cannot avoid the liveness trade:
- High-impact writes: require real-time countersign; if partitioned, deny (security-first).
- Low-impact writes: allow under micro-lease or local policy, with strict limits and aggressive audit.
- Read/plan autonomy: continue operating (collect evidence, stage a plan, simulate changes), but defer commit until countersigned.
Conclusion
“Probationary mode” is implementable as a two-man rule if you (1) move write authorization into a separate countersigner domain, (2) gate writes at an enforcement point the agent can’t bypass, and (3) avoid deadlock by using a coordinator/consensus flow with hard timeouts. The practical, low-DoS version is real-time countersign for high-risk writes plus narrowly scoped, short-lived micro-leases for routine writes.