Veteran security engineer Niels Provos is working on a new technical approach designed to stop autonomous AI agents from taking actions you haven’t specifically authorized.
His open-source software solution, called IronCurtain, aims to neutralize the risk of an LLM-powered agent “going rogue” – whether through prompt injection or the agent gradually deviating from the user’s original intent over the course of a long session.
How does IronCurtain work?
In the last few months, there have been reports of autonomous AI agents going off the rails due to agentic misalignment).
Instead of allowing them unlimited access to the user’s system, IronCurtain makes sure that the agent will not interact with it directly, and that its intended actions will be first be analyzed by a separate trusted process.
“Every agent, whether a direct LLM session or Claude Code running in a Docker container, goes through the same pipeline,” says Provos.
Once the user gives it an instruction, the agent writes TypeScript code that runs inside a V8 isolated virtual machine, and issues typed function calls that map to MCP tool calls (i.e., requests an AI sends to external tools through the Model Context Protocol so they can do things).
These tool-call requests are forwarded to the trusted process – a MCP proxy – that acts as a policy engine, and will “decide” whether each call should be allowed, denied (blocked), or escalated to a human for approval.
The decisions of this policy engine rely on a “constitution”: a set of guiding principles and concrete guidance written in plain English by the user and “translated” into a security policy by IronCurtain.
IronCurtain: An open-source, safeguard layer for autonomous AI assistants - Help Net Security
IronCurtain is an open-source AI agent security layer that blocks unauthorized actions by autonomous LLM assistants.