OAuth Is Broken for AI Agents
Disclosure: I’m building Clawvisor, a self-hosted credential gateway for AI agents. I obviously have a perspective here, but this problem exists regardless of my solution, and I’d rather see it solved by anyone than ignored by everyone.
OAuth 2.0 is one of the most successful standards in the history of software. It solved a genuinely hard problem (delegated authorization without sharing passwords) and did it well enough that virtually every API on the internet uses it. The ecosystem of libraries, provider support, and developer familiarity around OAuth is unmatched. It works. For traditional applications.
AI agents are not traditional applications. OAuth authorizes access at token-issuance time, but agents need authorization at action-execution time, on every API call, against a semantic policy the user controls. That gap can’t be closed within OAuth’s model, and it creates real security problems that the industry is mostly ignoring.
OAuth’s Implicit Contract
The OAuth 2.0 authorization code grant (RFC 6749 §4.1) establishes an implicit contract: a client application redirects the user to an authorization server, the user sees a consent screen, clicks “Allow,” and the application receives a scoped token.
This contract rests on two assumptions:
-
The client is a fixed application. Its behavior is determined at build time. The OAuth spec treats the client as a known entity with predictable behavior.
-
Scopes meaningfully bound what the client will do. When the user grants
gmail.readonly, the application reads email. The scope is a meaningful boundary because the app self-limits to the operations the developer wrote.
This holds when the client is a compiled binary or a deployed web app. The app does the same thing every time, for every user, because its behavior is encoded in source code.
When the client is an AI agent, both assumptions break.
The Client’s Behavior Changes at Runtime
A traditional OAuth client is deterministic in the ways that matter for security. A mail client with write access will compose, send, and organize email. Same operations, same code paths, regardless of what’s in the user’s inbox. The token’s capabilities may exceed what the app uses, but the app’s behavior is bounded by what the developer wrote.
An agent’s behavior is bounded by its prompt, its context window, and whatever input it processes at runtime. Same token, same scopes, radically different behavior depending on what the agent encountered five seconds ago.
This isn’t theoretical. In February 2026, a director at Meta’s AI safety team asked an agent to archive old emails and review another inbox, explicitly instructing it not to take action until she approved. The agent started mass-deleting emails from her personal inbox anyway. She told it to stop. Twice. It kept going. She had to physically run to her machine and kill the process.
The cause wasn’t malice or prompt injection. The agent’s context window filled up, prior messages got compacted, and the original instruction (“don’t action until I tell you to”) was lost. The agent, now operating without its safety constraint, decided that deleting emails was consistent with inbox management. The token had the capability. The agent exercised it.
That’s one anecdote, but the failure mode is general. A traditional mail client can’t forget its own rules mid-execution. An agent can, whether through context window limits, misinterpreted instructions, being overly helpful, or deliberate prompt injection. These aren’t exotic attack scenarios. They’re inherent to how language models work.
OAuth’s threat model (RFC 6819) covers token leakage, CSRF, redirect URI manipulation, and scope escalation. It does not cover a client whose behavior changes based on the data it accesses with the token. The threat model assumes the client is a known quantity. Agents are not.
Scopes Don’t Map to Intent
Here’s what a typical OAuth consent screen shows when an agent-powered app requests Gmail access:
AcmeAgent wants to: Read, compose, send, and permanently delete all your email in Gmail
Here’s what the user actually wants:
Let my agent read my last 10 unread emails to triage them.
The gap between these two statements is the entire attack surface.
OAuth scopes are structural. They describe which API endpoints a token can call. They don’t decompose along the boundaries that matter for agents. There’s no scope for “read but don’t send,” or “send only replies to existing threads,” or “read only emails from today.” The permission model was designed for applications with predictable behavior, where coarse scopes work because the application self-limits.
An agent’s authorization should be semantic: “triage my inbox, read emails, summarize them, draft responses, but don’t send anything without asking.” There’s no OAuth grant that expresses this.
Even if APIs redesigned their scopes to be more granular, you still can’t express what agent authorization requires:
- Temporal bounds. “Valid for 30 minutes, not 60 days.” OAuth token expiry is set by the authorization server, not the user, and refresh tokens extend access indefinitely.
- Conditional execution. “Send this email only if the recipient is already in the thread.” A bearer token can’t inspect parameters against a policy.
- Intent verification. “Only if the request is consistent with inbox triage.” This requires understanding the semantic relationship between an API call and a declared purpose.
OAuth gives the agent a master key when it needs a hall pass.
The Token Custody Problem
OAuth assumes the client is trusted to hold credentials. RFC 6749 §10.3 requires credentials not be “stored in a manner accessible to other application components.” For a server-side web app, this is straightforward.
For agents, there’s no good answer. The common response is “just don’t expose the token to the model, use tool calls.” But in practice, most agent frameworks pass credentials through the agent’s runtime. The token ends up in environment variables, config files, or tool responses that the agent can read. Even tool-based architectures where the LLM never “sees” the raw token still grant the agent the ability to make arbitrary API calls within the token’s scope. You’ve constrained the exfiltration surface, but not the authorization surface. The agent can still use the token’s full capabilities even if it can’t read the token string.
The core issue: OAuth trusts clients with credentials because the client’s behavior is known and bounded. Agents are programs whose behavior changes based on untrusted runtime input. Handing a bearer token to an agent is delegating credential custody to something that can be confused, misled, or just get off-task.
What Would Agent Authorization Actually Look Like?
Not a new OAuth grant type. The problem is more fundamental.
OAuth conflates two concerns that need to be separated for agents: who can access the API and what should be done with that access. For traditional apps, conflating these is fine because the app’s code is the policy. For agents, they need to be enforced independently.
The agent shouldn’t hold the credential at all. Instead:
- The agent describes what it wants to do. Not “give me a Gmail token” but “read the 10 most recent unread messages.”
- A separate system evaluates the request against a policy the user controls. Semantic, not structural. “Read messages” is consistent with “inbox triage.” “Forward all messages to an external address” is not. Some of this can be enforced with simple rules (action allowlists, recipient domain checks). Some of it may use an LLM for intent verification. Either way, the enforcement happens outside the agent’s process, on infrastructure the agent can’t influence.
- The deterministic system executes the request using a credential the agent never sees. The token stays in infrastructure whose behavior is determined by code, not by a language model.
- The result, not the credential, is returned to the agent.
This is the gateway pattern:
Agent Gateway API Provider
| | |
|-- action request -->| |
| |-- evaluate policy |
| |-- inject credential -->|
| |<-- API response -------|
|<-- result (no cred)-| |
The agent is a requester, not an executor. The credential is held by infrastructure that is auditable and under the user’s control.
Authorization moves from token-issuance time to action-execution time. But that doesn’t mean a human approves every API call. That wouldn’t scale. Instead, the user defines a policy up front (“this agent can read emails from the last 24 hours, but can’t forward anything to addresses outside the organization”), and the gateway enforces it automatically on every request. Most actions execute immediately because they fall within the policy. The ones that don’t get blocked or held for review. The user stays in control without being in the loop on every call.
OAuth doesn’t disappear here. The user still does an OAuth flow to connect their account. The token still exists. But it goes to the gateway, not to the agent. The gateway is a traditional OAuth client: deterministic, bounded, not influenced by untrusted input. OAuth’s model holds for the gateway because the gateway is exactly the kind of client OAuth was designed for.
Where This Leaves Us
OAuth works exactly as designed for the clients it was designed for. The problem is that the industry is duct-taping it onto agent architectures where its threat model, granularity, and trust assumptions are all wrong.
“Just use OAuth” for agents means granting tokens with capabilities that far exceed the agent’s purpose, trusting a client whose behavior changes at runtime to hold and responsibly use them, and having no mechanism to enforce policy after the token is issued. These aren’t edge cases. They’re the default configuration of every agent-to-API integration shipping today.
The fix isn’t to abandon OAuth. It’s to recognize that OAuth is an implementation detail (the mechanism for connecting accounts) not the authorization layer for agents. That layer needs to be semantic, per-action, temporally bounded, and enforced by deterministic infrastructure the agent cannot influence.
OAuth was built for a world where the client is a program and the user is at the keyboard. In the agent world, the client is a system whose behavior changes based on untrusted input, and the user may not be present when the most consequential API calls are made. The authorization model needs to catch up.