Credential Broker & Token Vending

The Problem

The multi-agent platform had a credentials problem hiding in plain sight. Three long-lived GitHub Personal Access Tokens (PATs) were distributed across agent pods:

$GITHUB_PORTFOLIO_TOKEN — write access to the portfolio repository, injected via Vault/ESO into agent pods
$GITHUB_TOKEN — read-only access scoped to the infrastructure repo, used by agents for code search and repository operations
A dedicated PAT for the mcp-github-search MCP server — its own token for GitHub API access, separate from the agent tokens

Each of these tokens had the same fundamental problems:

Blast radius. A compromised PAT grants its full scope of permissions to whoever holds it. If an agent pod were compromised, the attacker would get persistent GitHub access that outlives the incident. PATs don’t expire on their own schedule — they last until manually revoked.

No per-repo scoping. GitHub PATs are scoped by permission type (e.g., repo, read:org), not by repository. A token that can push to the portfolio repo can also push to any other repo the user owns. The principle of least privilege is structurally impossible with PATs.

Manual rotation. Rotating a PAT means generating a new one in GitHub’s UI, updating the value in Vault, and waiting for ESO to sync the new secret to pods. This is a manual, error-prone process that happens rarely because it’s annoying — which means tokens live far longer than they should.

Scattered responsibility. Three different tokens, three different scopes, three different places to remember to rotate. The mcp-github-search server had its own PAT entirely outside the Vault-managed lifecycle, configured as a plain environment variable in its Helm chart values.

The Solution

The MCP Gateway now includes a TokenBroker — a gateway-native token vending service that issues short-lived, scoped GitHub tokens on demand via a GitHub App backend. Agents request tokens for specific repositories with specific permissions, and the broker returns a token that expires in one hour.

Instead of static credentials baked into pod environments, agents make authenticated requests to:

POST /v1/tokens/github

The request specifies exactly what’s needed — which repositories and which permissions — and gets back a token scoped to precisely that. No more, no less.

Architecture

The token vending system has three components:

TokenBroker (Orchestrator)

The TokenBroker is the request handler for /v1/tokens/{service}. It receives token requests, validates them against the requesting agent’s configured capabilities, selects the appropriate backend, and returns the scoped token.

The broker doesn’t know how to generate GitHub tokens — it delegates to backends. This separation means adding support for other credential providers (GitLab, AWS STS, etc.) requires implementing a new backend, not modifying the broker.

TokenBackend Protocol

A simple interface that any credential backend must implement:

Accept a set of requested repositories and permissions
Validate that the request is fulfillable
Generate and return a scoped, time-limited token
Report the token’s TTL and actual granted permissions

The protocol is deliberately minimal. Backends handle their own authentication to the upstream provider and their own caching strategy.

GitHubAppBackend

The first (and currently only) backend. It uses a GitHub App installation to generate scoped installation tokens:

JWT Generation — The backend signs a JWT using the GitHub App’s private key (stored in Vault, injected via ESO). The JWT identifies the app and is valid for 10 minutes.
Installation Token Exchange — The JWT is exchanged via GitHub’s API for an installation access token scoped to the requested repositories and permissions.
Caching — Generated tokens are cached by their scope signature (sorted repos + permissions hash). Subsequent requests with the same scope return the cached token if it has sufficient remaining TTL (>5 minutes).

How It Works

A typical token request flow:

1. Agent requests a token:

POST /v1/tokens/github
{
  "repositories": ["spencer2211/spencerfuller.dev"],
  "permissions": { "contents": "write" }
}

2. Broker validates capabilities: The broker checks the requesting agent’s configuration. Each MCP server definition in the gateway config can declare tokenCapabilities that limit which repos and permissions that server (or agent) can request:

tokenCapabilities:
  github:
    repositories: ["spencer2211/spencerfuller.dev"]
    permissions:
      contents: write
      pull_requests: write

If the request exceeds the configured capabilities — requesting a repo not in the allowlist, or a permission level higher than configured — the broker rejects the request entirely. There is no silent downgrade to a subset of permissions. This is a deliberate design choice: partial credential grants can lead to subtle bugs where an agent proceeds with insufficient permissions and fails halfway through an operation.

3. Backend generates the token: The GitHubAppBackend generates a JWT, calls GitHub’s installation token endpoint with the specific repos and permissions, and receives a scoped token.

4. Token returned to agent:

{
  "token": "ghs_xxxxxxxxxxxx",
  "expires_at": "2026-02-16T23:14:00Z",
  "permissions": { "contents": "write" },
  "repositories": ["spencer2211/spencerfuller.dev"]
}

The token is valid for approximately one hour. The agent uses it for its immediate operation and discards it. No storage, no persistence, no rotation concern.

Security Model

Principle of Least Privilege

Every token is scoped to exactly the repositories and permissions requested. Even though the GitHub App installation may have access to multiple repositories, the installation token endpoint allows scoping down to a subset. The broker enforces this server-side — an agent cannot request broader access than its tokenCapabilities allow, regardless of what the underlying GitHub App installation permits.

Capability Enforcement

Token capabilities are defined in the MCP Gateway configuration, not by agents themselves. An agent cannot self-declare its own permissions. The configuration is managed through the GitOps pipeline (Flux + Helm), meaning capability changes require a git commit, a PR review, and a Flux reconciliation — the same change control process as any infrastructure modification.

Server-Side Repo Scoping

The GitHub App installation may cover an organization or multiple repositories. The broker explicitly scopes each installation token to only the requested repositories. This is defense in depth — even if an agent somehow bypassed the capability check, the token itself is scoped at the GitHub API level.

Caching with TTL Awareness

Tokens are cached by their scope signature to avoid redundant API calls. The cache respects TTL — a cached token is only returned if it has more than 5 minutes of remaining validity. This prevents handing out tokens that are about to expire, which would cause operations to fail midway.

No Persistent Credentials

The only long-lived secret in the system is the GitHub App private key, stored in Vault and injected via ESO. This key never leaves the gateway pod. Everything else — JWTs, installation tokens, cached tokens — is ephemeral and expires automatically.

Design Decisions

Why GitHub App Over PATs

GitHub Apps are the platform’s intended mechanism for programmatic access. They provide:

Per-request scoping — installation tokens can be limited to specific repos and permissions
Automatic expiration — tokens expire in 1 hour, no manual rotation needed
Audit trail — GitHub logs all API activity by the App, separate from user activity
No user account dependency — the App exists independently of any user’s account

PATs are fundamentally user-scoped. They inherit the user’s permissions and cannot be narrowed per-request. A PAT that can write to one repo can write to all repos the user owns. This is architecturally incompatible with least-privilege credential management.

Why Gateway-Native vs Standalone Service

The token broker runs inside the MCP Gateway process rather than as a separate microservice. This decision reflects the deployment context:

Shared authentication — the gateway already authenticates agent requests. A standalone service would need its own auth layer or would need to trust the gateway’s forwarded identity.
Configuration co-location — token capabilities are part of the MCP server configuration. Keeping the broker in the gateway means one config file, one deployment, one reconciliation cycle.
Resource efficiency — on ARM64 SBCs with 16GB RAM per node, every additional pod costs memory. The broker adds ~10MB to the gateway’s footprint instead of requiring its own pod, service, and network policy.

Why Reject-on-Insufficient (No Silent Downgrade)

When an agent requests permissions that exceed its capabilities, the broker returns an error rather than silently granting a subset. This seems strict, but the alternative is worse:

An agent requests contents: write but only has contents: read capability
A silent downgrade grants a read-only token
The agent proceeds to attempt a write operation
The write fails with a 403 from GitHub
The agent has to handle this failure case anyway

By rejecting upfront, the failure is immediate, clear, and actionable. The agent knows its configuration is wrong before it starts any work. This follows the fail-fast principle — surface errors at the earliest possible point.

What It Replaced

Before: Static PATs

Agent Pod
├── $GITHUB_PORTFOLIO_TOKEN  (PAT, write, all repos, never expires)
├── $GITHUB_TOKEN             (PAT, read, all repos, never expires)
└── mcp-github-search server
    └── own PAT               (read, all repos, never expires)

Three tokens, all long-lived, all broader than necessary, all requiring manual rotation.

After: Dynamic Token Vending

Agent Pod
└── POST /v1/tokens/github
    └── Returns: scoped token (1 repo, specific permissions, 1hr TTL)

Zero long-lived GitHub tokens on agent pods. The only persistent secret is the GitHub App private key in Vault, accessible only to the MCP Gateway.

Current State

The credential management strategy has shipped in two phases, with both phases live and operational.

Credential Flow Architecture

Phase 1: Gateway Credential Injection (Live)

The MCP Gateway already acts as an authenticated proxy for backend services. Credentials stored in Vault are injected into outbound requests by the gateway — agents never see the raw tokens:

Atlassian — Jira and Confluence API calls are authenticated by the gateway using OAuth credentials from Vault. Agents call atlassian_rest or MCP Atlassian tools; the gateway injects the Authorization header.
Home Assistant — REST API calls to HA are proxied through the gateway’s homeassistant_rest tool with a long-lived access token injected from Vault.
GitHub (MCP tools) — The gateway’s GitHub MCP server uses a Vault-managed token for read operations (code search, file retrieval, repository listing).

This eliminated the mcp-github-search server’s standalone PAT and centralized credential management in one place. Two PATs remained on agent pods — $GITHUB_PORTFOLIO_TOKEN (write access for portfolio pushes) and $GITHUB_TOKEN (read-only for infrastructure repo) — and were the targets for Phase 2.

Phase 2: Token Vending via GitHub App (Live)

The full token vending architecture described above — POST /v1/tokens/github, per-request scoping, 1-hour TTL, capability enforcement — is live and operational. The GitHub App has been created, its private key stored in Vault, and installations configured per repository.

Phase 2 has eliminated the remaining PATs from agent pods, completing the transition from static credentials to fully dynamic token issuance.

Delivered Impact

With both phases live and operational:

Eliminated all long-lived PATs from agent pods — $GITHUB_PORTFOLIO_TOKEN, $GITHUB_TOKEN, and the already-decommissioned mcp-github-search PAT
Per-request scoping — each token limited to exactly the repositories and permissions needed for the current operation
Automated rotation — tokens expire in 1 hour with no manual intervention. The concept of “rotation” has disappeared now that credentials are ephemeral.
Simplified security model — from “manage N tokens with different scopes and rotation schedules” to “one GitHub App key in Vault, everything else is dynamic”

Technology Stack

Token Broker: Gateway-native (MCP Gateway, Node.js)
Backend: GitHub App (RS256 JWT + installation token API)
Key Management: HashiCorp Vault + External Secrets Operator
Caching: In-memory with TTL-aware eviction
Configuration: Helm values via GitOps (Flux)
Runtime: Kubernetes on ARM64 (Orange Pi 5)