Agentic Minds, Defined Boundaries: Why Security Belongs in Your AI Harness From Day One

According to the 2025 State of Non-Human Identities and Secrets report, non-human identities now outnumber human identities in enterprise systems by 144 to one. A year ago that ratio was 92 to one. The growth isn't slowing. And 97% of those non-human identities carry excessive privileges — permissions they were granted "just in case" and that nobody has reviewed since. This is the environment your AI agents are being deployed into.

The previous post in this series — The AI Agent Harness: Why Every Industry Needs One — made the case that the harness architecture is what separates production-grade AI from well-prompted prototypes. It touched on security by design as one of the structural properties the harness delivers. That section earned a standalone treatment, because the security architecture of an agentic system isn't a configuration choice you make after the harness is built. It is a design constraint that shapes the harness from the first line of scaffolding code.

Every agent in your harness is a privileged actor with real-world side effects. The failure modes are well-understood. The blast radius when they occur in an agentic context is not.

Agents Are Identity Principals — Treat Them Like It

The foundational error most teams make is treating AI agents as software functions rather than as identity principals. A function executes code. An identity principal acts on behalf of someone — or something — and those actions are traceable, auditable, and governed by defined permissions. The distinction matters enormously when the action is "send this email," "modify this database record," or "invoke this API on behalf of the user."

A Cloud Security Alliance survey on agentic AI identity found that only 21.9% of organizations currently treat AI agents as independent identity-bearing entities, and 45.6% rely on shared API keys for agent-to-agent authentication. Meanwhile, BeyondTrust's Phantom Labs research reports a 466.7% year-over-year increase in AI agents operating inside enterprises with privileged access that security teams cannot see or govern. The other 78.1% are either sharing credentials across agents, borrowing human user tokens, or relying on ambient permissions that exist because a service account was given broad access years ago and nobody revoked it. When that credential is compromised — or when the agent is manipulated — the attacker doesn't inherit one agent's capability. They inherit whatever that credential was authorized to do across every system that trusted it.

The right model is straightforward even if the implementation takes discipline: every agent gets a unique, non-shared credential. That credential carries an explicit permission scope — not "what this agent might someday need" but "what this agent needs to complete its current task class." The delegation chain is traceable — you can answer the question "who authorized this agent to take this action?" all the way back to a human principal. And the credential has a bounded lifetime.

Cloud providers have recognized this problem and are building toward it. AWS Bedrock AgentCore Identity, GCP Workload Identity Federation, and GitHub's agentic workflow model all provide mechanisms for scoping agent credentials without conflating them with human or service-account identities. The pattern to layer on top of those primitives is role-based access control for agent types — base RBAC establishes what a "document processing agent" or a "database query agent" can do — and attribute-based access control layered on top, so that runtime context (which user triggered this task, from which environment, against which data classification) can further constrain what the credential is permitted to do in this specific invocation.

The permission ceiling principle is non-negotiable: an agent must never be able to take an action that the invoking human user themselves could not take. If a user without admin privileges triggers an agent workflow, that agent cannot escalate to admin-scoped credentials regardless of what its system prompt requests. The harness enforces this ceiling. The model does not.

Just-in-Time Permissions vs. Just-in-Case

The legacy approach to credential provisioning is to grant everything an agent might ever need at deployment time, then leave those permissions standing indefinitely. The appeal is operational simplicity — one provisioning event, no runtime credential management, nothing to expire or rotate. The problem is that you've built a standing attack surface that is one injection attack away from full exploitation.

The correct model inverts the assumption. Agents start with zero or minimal standing permissions. When a specific task requires elevated access — writing to a database, calling an external API, reading from a sensitive document store — the agent requests a time-bounded elevation, performs the operation, and the credential is revoked automatically. The standing permissions are minimal. The elevated permissions exist only for the duration of the operation that required them.

The contractor keycard analogy makes the stakes concrete. A permanent, unrestricted keycard that opens every door in the building is catastrophically different from a temporary access code that works for one room during a scheduled maintenance window and expires automatically when the window closes. The work gets done either way. But when the permanent keycard is lost or stolen, the blast radius is the entire building. When the temporary code leaks after the window, there is nothing to exploit.

The tooling for JIT credential management is mature. HashiCorp Vault generates dynamic database credentials on demand with configurable TTLs — the credentials are issued, used, and retired without ever being stored as a static secret. AWS STS AssumeRole produces bounded tokens scoped to a specific role and time window. GCP Workload Identity Federation removes the need for long-lived service account keys entirely by federating workload identity through short-lived tokens issued at runtime. GitHub's agentic workflow model implements JIT by default: workflows start with read-only tokens, and write permissions are granted only for the isolated job that needs them, revoked automatically at job completion even if the workflow fails.

This is not a premium practice for high-security environments. It is the baseline architecture for any harness that runs agents with real-world write access.

Enforce at the Infrastructure Layer — Not the Prompt

There is a distinction that matters more than almost any other in agentic security architecture: the difference between a security rule that lives in a system prompt and a security rule that lives in the infrastructure. The first is a suggestion. The second is a constraint.

A system prompt that says "never access files outside the project directory" will be followed by the model under normal operating conditions. It will not be followed if a sufficiently sophisticated injection attack overrides it. It will not be followed if a future model version interprets the instruction differently. It will not be followed in the edge case that nobody anticipated when the prompt was written. System prompts are not a security boundary. They are an intent declaration.

Claude Code's hook architecture is a reference implementation of what infrastructure-layer enforcement looks like in practice. PreToolUse hooks intercept every tool call before execution and make a deterministic decision: allow, deny, ask, or defer, with a reason surfaced to the model. The exit code contract is unambiguous — exit 2 blocks execution regardless of what the model wants to do. (Counterintuitively, exit 1 is a non-blocking error and execution continues; exit 2 is the deliberate block signal.) When multiple hooks conflict, deny always wins over defer, which wins over ask, which wins over allow. A ConfigChange hook prevents the model from modifying settings mid-session, including disabling the hooks themselves — the governance layer is immutable from inside the session. And for enterprise deployments, the allowManagedHooksOnly setting ensures only administrator-deployed hooks run, closing the gap where an injected instruction could install a permissive hook to bypass the governance layer.

The same principle scales through Open Policy Agent, which allows security teams to write policies in Rego and evaluate them against every agent action before execution. Microsoft's Agent Governance Toolkit, released as open source in April 2026, takes this further: an Agent OS layer intercepts actions at sub-millisecond latency, a Semantic Intent Classifier counters goal hijacking, and circuit breakers can isolate or terminate a rogue agent without manual intervention. The pattern is consistent across all of these: security logic that lives outside the model's context window, that cannot be overridden by what the model reads or reasons about, and that enforces the governance layer deterministically.

Database-Level Enforcement — Because Compromised Agents Still Try

Strong identity and JIT permissions reduce the standing attack surface significantly. They do not eliminate the risk that a running agent, within its legitimate permission scope, is manipulated into requesting something it shouldn't. The database itself must enforce limits independently — not because application-layer security is inadequate in principle, but because the database cannot know or trust whether the application calling it has been compromised.

PostgreSQL Row-Level Security makes this concrete. Policies at the database layer restrict which rows a given database role can see or touch, evaluated on every query regardless of what the application code is doing. A read-only agent role with a SELECT-only policy cannot insert or update records no matter how the query is constructed. An agent scoped to a specific tenant's data cannot read another tenant's records even if the application logic that normally enforces tenant isolation has been bypassed. The critical constraint: agents must never run as superuser. Superusers bypass RLS policies entirely — the mechanism that protects every other role provides no protection when the superuser flag is set.

A 2024 incident in a financial services reconciliation workflow illustrates the blast radius precisely. A reconciliation agent was manipulated through injected instructions into constructing a query for "all customer records matching pattern X," where the pattern was broad enough to encompass the entire customer table. The application layer did not catch it — the query was syntactically valid and the agent's credentials had read access to the customer table. Proper RLS scoped to that agent's role — restricting access to only the accounts in the current reconciliation batch — would have returned zero exploitable records regardless of how the query was formed.

Audit logging at the data layer completes the picture. Which agent, which query, what parameters, what was returned, associated with which session and task ID. This logging cannot be circumvented by a compromised agent. If the application layer was manipulated, the database record of what actually happened is the authoritative forensic trail. Combined with dynamic secrets from Vault — so that even the database credentials themselves expire and are never stored as static values — you have a data access layer where the standing attack surface has been systematically minimized.

Prompt Injection Is a Systems Problem, Not a Model Problem

OWASP's LLM01:2025 identifies prompt injection in 73% or more of production AI deployments. That number isn't a reflection of how difficult it is to write a secure system prompt. It is a reflection of a structural vulnerability that no amount of prompt engineering can close.

The critical distinction is between direct injection — where a user types malicious instructions into a prompt field — and indirect, in-context injection, where the agent reads a document, email, web page, or tool output that contains embedded instructions. Direct injection is partially controllable through input validation. Indirect injection is not fixable at the model layer. OWASP and Lakera's research is unambiguous: indirect prompt injection is a system-level vulnerability created by blending trusted and untrusted inputs in one context window. All eight of the existing prompt-level defense mechanisms can be bypassed by adaptive attacks with success rates above 50%.

The documented incidents define the real threat model. In August 2024, Slack's AI feature was exploited through injection via private channel content — the AI was caused to summarize and exfiltrate conversations to an external address. EchoLeak, discovered in 2025, was a zero-click injection in Microsoft Copilot: a single malicious email caused the assistant to extract and exfiltrate data from OneDrive, SharePoint, and Teams without any further user interaction. A financial services RAG deployment was compromised in January 2025 when malicious instructions embedded in a public document caused the agent to leak intelligence, modify its own system prompts, and execute elevated API calls.

The MCP ecosystem has introduced a supply chain attack surface that compounds this exposure. Research from Invariant Labs found that 43% of publicly available MCP servers carry command execution vulnerabilities, and when auto-approval is enabled on tool calls, tool poisoning attacks succeed at an 84.2% rate. The postmark-mcp incident is the clearest illustration: a counterfeit npm package mimicked the official Postmark MCP server, and version 1.0.16 silently BCC'd every outgoing email to an attacker-controlled address. The only attack surface was a legitimate-looking package name.

The systems architecture that handles this class of vulnerability looks like: treat every tool output as untrusted, strip tool results from any secondary classifier that governs subsequent decisions (as Claude Code's auto mode does), scan agent outputs before applying them downstream, execute agent-generated code in ephemeral isolated containers that are discarded after the task completes, and restrict outbound network access to an explicit allowlist. None of these mitigations depend on the model being resistant to injection. All of them enforce constraints at the infrastructure layer where injection cannot reach.

Governance as Code

The framework landscape has converged. OWASP's Agentic Top 10 for 2026 is the primary practitioner standard for agentic AI security. NIST's AI Risk Management Framework has an agentic profile extending the Govern, Map, Measure, and Manage structure to multi-agent systems. MITRE ATLAS v5.4, published in February 2026, catalogs 16 tactics and 84 techniques including AI Agent Context Poisoning, Memory Manipulation, and Escape to Host — attacks with no analog in non-agentic systems that require purpose-built architectural mitigations.

What these frameworks describe, when you strip away the compliance language, is governance as code. Security logic that is version-controlled, tested against known attack patterns, and deterministically enforced through the infrastructure layer rather than through policy documents that nobody reads at 2 AM when an agent is running a production workflow.

The seven principles that should be wired into every harness from day one: design for identity first so that every agent is a traceable principal from the moment it is instantiated; enforce least privilege at every layer — application, infrastructure, and data — not just at whichever layer is most convenient; use just-in-time over just-in-case so that standing attack surfaces stay minimal; enforce at infrastructure and not at the prompt, because prompts are suggestions and infrastructure is law; treat all external inputs as adversarial inputs, always; contain blast radius by assuming agents will be compromised and designing for minimum damage when they are; and version-control, test, and deploy governance logic the same way you deploy application code.

That last principle is what separates mature agentic security architecture from the "we'll add security later" approach that most teams are currently taking. Security bolted onto an existing harness lives at the edges — an input filter here, an output check there — while the core architecture retains all its original exposure. Security designed into the harness from the start lives in the identity model, the permission model, the data access layer, and the infrastructure enforcement layer simultaneously. The blast radius of any individual agent failure is bounded by construction, not by hope.

The harness architecture makes AI agents production-grade. Security-by-design is what makes the harness itself trustworthy. Neither is optional, and neither can be deferred to the next sprint.

If you're building or evaluating an AI agent architecture and want a security-by-design review before you reach production, that's a conversation we're ready to have.