How is a short-lived MCP URL different from a regular JWT expiration?

A JWT exp claim is half the story. The other half is rotation of the refresh token that gets you a new JWT. Most MCP deployments today set an exp claim of weeks or months and never rotate the refresh token, so the practical security is the same as a static key. The TTL pattern fixes both halves: minutes on the access token, rotation on every refresh.

What happens to long-running agent jobs when the access token expires mid-call?

The right behavior is a silent refresh by the MCP client before the token expires, typically at 80% of TTL. If the client does not support refresh, the job fails halfway and has to be restarted. For batch agents that run for hours, a 60-minute TTL with proactive refresh is the common compromise. Anything longer than that is rare in practice.

Do I need a separate OAuth authorization server to run the TTL pattern?

The June 2025 MCP spec revision assumes yes. The MCP server is a resource server only; it validates tokens but does not mint them. In practice you can use any OAuth 2.1 provider (WorkOS, Auth0, Keycloak, or your own) as the authorization server. Running both roles in the same process is allowed by the spec but defeats the separation that makes the TTL pattern enforceable.

Short-lived MCP server URLs: the TTL pattern for AI agents in 2026

An MCP server URL is a credential. It looks like a regular HTTPS endpoint, but anything that holds it can call the tools behind it. If an agent has the URL, the agent can read the database, post to the channel, ship the deploy. If anyone else gets the URL, they can do the same things.

That is why long-lived MCP URLs are the wrong default. The fix is short-lived URLs with a TTL of minutes, scoped tokens, and rotation on every use. The pattern is not new: AWS calls them presigned URLs and has shipped them since 2011 for S3. What changed in 2026 is that AI agents now sit on the other end of these URLs, and the exposure rate is large enough to measure.

The problem in numbers

Trend Micro scanned the public internet for MCP endpoints in late 2025 and found 492 servers with no client authentication and no traffic encryption. Their April 2026 update put the count at 1,467 servers, a near-tripling in six months. Of those, 74% are hosted on AWS, Azure, GCP, and Oracle, and 90% provide direct read access to a backing data source. A natural-language prompt is enough to exfiltrate.

The other failure mode is auth bypass. CVE-2026-33032 (MCPwn) is a CVSS 9.8 against Nginx UI's MCP integration that takes two HTTP requests to own the server. Active exploitation was confirmed in May 2026. The pattern is the same one we have seen in every other generation of remote endpoints: the URL exists, the credentials are static, the rotation never happens.

Why long-lived URLs break

An MCP server URL ends up in four places it should not be: chat logs, IDE config files, screenshots in support tickets, and git history. None of those are accidents. They are the natural artifacts of how developers and agents work. A URL pasted into a Claude conversation lives in that thread until the thread is deleted. A URL committed to a private repo lives there until the repo leaks or a contractor's GitHub account is breached.

If the URL is good for a year, every one of those copies is a working backdoor for a year. If the URL is good for 15 minutes, the backdoor closes before the leak reaches an attacker who knows what to do with it.

What the TTL pattern looks like

The shape we use, ported from the S3 presigned URL pattern that AWS documents for object access, has four parts:

Short access TTL. The URL or access token is valid for 5 to 60 minutes. Most production agent calls finish in seconds, so the upper bound only matters for long-running jobs.
Long refresh, rotated on use. A refresh token good for 30 to 90 days, rotated every time it is exchanged. If the same refresh token is presented twice, both are revoked.
Tight scope per token. If the agent only needs read access to one storage account, the token says so. No bearer token in 2026 should carry the right to do anything on any resource.
Audit on every state change. Token acquired, refreshed, expiry warning, expired, revoked. The point is not the log line; the point is that a leaked refresh token shows up as a duplicate-rotation event within hours, not months.

The spec backbone is OAuth 2.1 with PKCE, which the June 2025 MCP authorization revision adopted. The revision also pushed MCP servers into the OAuth resource-server role only: they validate tokens issued by an external authorization server, they do not mint tokens themselves. That separation is what makes the TTL pattern enforceable. The resource server cannot keep accepting a token past its claim, even if it wanted to.

How short is short

The right number depends on the call profile. For agents that act on user prompts in real time, 5 to 15 minutes is enough. For batch jobs that fan out across hundreds of tools, 60 minutes gives the orchestrator breathing room without lengthening the leak window much. We have not yet found a production case where a 24-hour access token is the right answer. If the job needs that long, it needs a refresh loop instead.

For comparison, AWS lets you encode TTL into the URL itself for S3 (max 12 hours via console, 7 days via the CLI), and offers an s3:signatureAge bucket-policy condition that rejects requests where the signature is more than N minutes old, regardless of what the URL says (see AWS presigned URL best practices, PDF). That second control is the one to copy. The server enforces TTL, not the client.

The refresh-token gap

The pattern depends on the MCP client implementing refresh-token rotation. That implementation is missing in most clients on the market as of April 2026. SecureCoders tracks the support matrix and reports that no major MCP CLI client ships refresh-token rotation today. The practical consequence: an access token expires, the client cannot rotate it silently, and the human user is prompted to re-authenticate. The pattern still works, but the UX cost lands on the agent operator, not the platform.

Until clients close that gap, two stopgaps help. Run an MCP gateway in front of the server that owns the refresh loop and presents fresh tokens to upstream clients. Or accept the interactive re-auth as a feature: forcing a human in the loop every hour is not the worst failure mode for an agent with database write access.

What to log

Five events cover the lifecycle and they should land in the same audit stream as application logs:

token.acquired with subject, scope, and TTL.
token.refreshed with old token id, new token id, source IP.
token.expiry_warning at 80% of TTL, so the client can pre-emptively rotate.
token.expired when the resource server rejects.
token.revocation_detected when a rotated refresh token is replayed.

The last one is the alarm. A replayed refresh token means one of two things: either the client retried after a partial failure, which is fine and quiet, or two parties hold the same refresh token, which is not fine and should page someone.

Where the pattern does not fit

Three cases push back. First, local-only MCP servers that never leave the developer's machine; a TTL adds friction without buying security. Second, agents that run on hard-isolated networks with no egress; the leak vector is gone, and rotation is overhead. Third, the same MCP server consumed by tens of thousands of low-trust users; the auth bottleneck is the bigger problem there, and the solution is a gateway with per-user scopes, not shorter TTLs at the URL level.

For everything else (cloud-hosted MCP servers, contractor access, agents calling tools across organizational boundaries, anything Trend Micro could find on a port scan) the default should be a TTL of minutes and a refresh loop that nobody has to think about until something goes wrong.

Sources

Photo by Milad Fakurian ↗ on Unsplash ↗

Frequently asked questions

How is a short-lived MCP URL different from a regular JWT expiration?: A JWT exp claim is half the story. The other half is rotation of the refresh token that gets you a new JWT. Most MCP deployments today set an exp claim of weeks or months and never rotate the refresh token, so the practical security is the same as a static key. The TTL pattern fixes both halves: minutes on the access token, rotation on every refresh.
What happens to long-running agent jobs when the access token expires mid-call?: The right behavior is a silent refresh by the MCP client before the token expires, typically at 80% of TTL. If the client does not support refresh, the job fails halfway and has to be restarted. For batch agents that run for hours, a 60-minute TTL with proactive refresh is the common compromise. Anything longer than that is rare in practice.
Do I need a separate OAuth authorization server to run the TTL pattern?: The June 2025 MCP spec revision assumes yes. The MCP server is a resource server only; it validates tokens but does not mint them. In practice you can use any OAuth 2.1 provider (WorkOS, Auth0, Keycloak, or your own) as the authorization server. Running both roles in the same process is allowed by the spec but defeats the separation that makes the TTL pattern enforceable.

Related services

Anthropic Claude AI automation

Studio

Start a project.

One partner for the digital product you need to build. Faster delivery, modern tech, lower costs. One team, one invoice.

Tell us what you are building Read more articles

Tools arranged on a wall above a workbench

AI and Automation

Claude Code skills: when to use them and when they pile up as overhead

Skills load on demand and cost 75-150 tokens each in the listing. Use them for procedures over five lines. Skip them for project-wide rules.

May 30, 20267 min read

AI and Automation

AI integration agency in 2026: what their pitch should actually contain

An AI integration agency pitch in 2026 is judged on three things: what gets deployed, how it gets measured, and how it gets handed back. Ten checks to listen for, five red flags to walk out on.

May 24, 20267 min read

a group of cubes that are on a black surface

AI and Automation

12 mistakes we see teams make building their first multi-agent ops system

Multi-agent LLM systems fail in production at 41 to 86 percent. Most of the failures trace back to twelve specific decisions teams make in the first month. Here is what they look like and how to undo them.

May 23, 20267 min read

The problem in numbers

Why long-lived URLs break

What the TTL pattern looks like

How short is short

The refresh-token gap

What to log

Where the pattern does not fit

Sources

Frequently asked questions

Related services

Start a project.

Related articles

Claude Code skills: when to use them and when they pile up as overhead

AI integration agency in 2026: what their pitch should actually contain

12 mistakes we see teams make building their first multi-agent ops system