Skip to content

AI Agents Are Not Safe to Execute Real-World Actions Yet

AI agents are good enough to look trustworthy and unsafe enough to break production.

That is the uncomfortable truth behind most agent demos. An agent can read a ticket, summarize a thread, decide on a next step, and produce a polished plan. It can look like a competent operator. But the moment it is allowed to create a GitHub issue, send an email, delete a resource, or call a sensitive internal API, the problem stops being intelligence and becomes control.

Reasoning is improving fast. Execution safety is not.

That gap matters more than most teams want to admit.

The Illusion of Progress

A lot of progress in agents is real. Models are better at planning, tool selection, and multi-step reasoning. That progress creates a dangerous illusion: if the agent can decide what should happen, then surely it is close to being trusted to make it happen.

It is not.

An agent can decide a support issue should be escalated and open a GitHub issue in the wrong repository with the wrong labels and a secret pasted into the body. It can infer that a notification is needed and send an email to the wrong distribution list. It can interpret "clean up old test infrastructure" as permission to delete the wrong cloud resources. It can call an internal API with plausible but unsafe parameters.

The failure mode is not bad reasoning alone. The failure mode is bad reasoning combined with real permissions.

Agents are already capable enough to create damage that looks like normal automation.

The Real Problem

Most agent stacks have the same structural flaw. The model reasons, chooses a tool, passes arguments, and something executes. Credentials get fetched somewhere in the path. Policies, if they exist, are scattered across prompts, wrappers, and application logic. Logging is partial. Redaction is inconsistent. Runtime control is weak.

The missing piece is simple to describe: there is no control layer between agent intent and real-world action.

That is the entire problem.

Without that boundary, the system quietly assumes that once the model has produced a reasonable-looking action, execution is just plumbing. It is not plumbing. It is the security boundary.

This is the line worth remembering: The most dangerous part of an agent is not what it thinks. It is what it is allowed to do.

Why Existing Solutions Fail

Many enterprises assume they already have the answer because they already have identity and secrets infrastructure. They do not.

Identity Systems, SSO and RBAC answer important questions: who is the actor, what group are they in, what broad permissions do they have. That is necessary. It is not sufficient. If an agent inherits permission to create tickets, send email, or modify infrastructure, RBAC still does not answer the question that matters most: should this exact action with this exact payload be allowed right now?

Identity authenticates actors. It does not govern execution.

Vault and secret managers solve a different problem. They protect credentials at rest and control access to secrets. Also necessary. Also insufficient. Once an agent can use a credential directly or indirectly, Vault has already done its job. It does not determine whether "delete resource" is valid in the current environment. It does not inspect the outgoing request in context. It does not enforce runtime policy. It does not automatically create an execution audit trail fit for enterprise review.

Storing secrets safely is not the same as using them safely.

That distinction gets missed because agents do not fit the usual security model. Humans are deliberate, slow, and reviewed through interfaces. Services are narrow, deterministic, and coded against explicit contracts. Agents are neither. They are non-deterministic decision systems operating with machine speed and potentially broad access.

They look like software, but they behave more like improvising operators.

What Is Actually Needed

What enterprises need is an execution layer.

An execution layer sits between the agent and the external system. The agent does not directly hold credentials. It does not directly call a production API. It expresses intent in a constrained form: perform this approved action with this input. The execution layer resolves the action, injects credentials securely, evaluates runtime policy, executes the call, logs the result, and redacts sensitive data before it leaks into traces or transcripts.

This is the second line worth remembering: Agents should not be trusted with credentials just because they can describe why an action makes sense.

That changes the architecture in an important way. The model remains useful for reasoning, but it stops being trusted as the final authority on execution. Intent and execution become separate concerns. They should have been separate from the start.

Introducing the Idea

This is the lens behind KeyRunner.

KeyRunner is not built on the assumption that agents need unrestricted tool access with better prompts wrapped around them. It is built on the opposite assumption: agents should generate intent, but privileged execution should be handled by a system designed for control.

That is a different category from agent orchestration. It is not about making the model smarter. It is about making real-world actions governable.

How It Works

At a high level, the flow is simple.

The agent sends an action and input, not a raw API call with live credentials. KeyRunner receives that intent and resolves it against approved actions. It injects credentials securely without exposing them to the agent, executes the call, enforces runtime policy, redacts sensitive data, and records an auditable trail of what happened.

That model becomes concrete very quickly.

If an agent wants to create a GitHub issue, the question is not "does it have a token?" The question is whether issue creation is allowed in this repository, with this body, these labels, and this data.

If an agent wants to send an email, the question is not "can it reach the email API?" The question is whether this recipient set is allowed, whether the content crosses a confidentiality boundary, and whether the action requires approval.

If an agent wants to delete resources, the question is not "is the cloud credential valid?" The question is whether deletion is permitted for this environment, this scope, and this runtime context.

Safe execution is not about tool access. It is about controlled translation from intent to action.

Why This Becomes Critical Now

This matters now because agents are moving out of sandbox demos and into production workflows.

Teams are wiring them into ticketing systems, inboxes, CI pipelines, CRM operations, support tooling, and internal platforms. The operational surface area is expanding faster than the control model around it. That is a bad trade.

If you let agents act across systems without a dedicated execution layer, you are not deploying automation. You are deploying unbounded judgment with credentials attached.

That is the third line worth remembering: Prompting is not a security boundary. Runtime control is.

The enterprise stack for agents is going to split in two. One layer will handle reasoning, planning, memory, and orchestration. The other will handle secure execution. The first makes agents useful. The second makes them safe enough to deploy.

The companies that understand that distinction early will build durable systems. The ones that do not will keep shipping impressive demos with fragile operating models underneath.

Agents generate intent. What's missing is a trusted way to execute it.

Released under the MIT License.