Credential Broker & Token Vending
The Problem
Section titled “The Problem”The multi-agent platform had a credentials problem hiding in plain sight. Three long-lived GitHub Personal Access Tokens (PATs) were distributed across agent pods:
$GITHUB_PORTFOLIO_TOKEN— write access to the portfolio repository, injected via Vault/ESO into agent pods$GITHUB_TOKEN— read-only access scoped to the infrastructure repo, used by agents for code search and repository operations- A dedicated PAT for the
mcp-github-searchMCP server — its own token for GitHub API access, separate from the agent tokens
Each of these tokens had the same fundamental problems:
Blast radius. A compromised PAT grants its full scope of permissions to whoever holds it. If an agent pod were compromised, the attacker would get persistent GitHub access that outlives the incident. PATs don’t expire on their own schedule — they last until manually revoked.
No per-repo scoping. GitHub PATs are scoped by permission type (e.g., repo, read:org), not by repository. A token that can push to the portfolio repo can also push to any other repo the user owns. The principle of least privilege is structurally impossible with PATs.
Manual rotation. Rotating a PAT means generating a new one in GitHub’s UI, updating the value in Vault, and waiting for ESO to sync the new secret to pods. This is a manual, error-prone process that happens rarely because it’s annoying — which means tokens live far longer than they should.
Scattered responsibility. Three different tokens, three different scopes, three different places to remember to rotate. The mcp-github-search server had its own PAT entirely outside the Vault-managed lifecycle, configured as a plain environment variable in its Helm chart values.
The Solution
Section titled “The Solution”The MCP Gateway now includes a TokenBroker — a gateway-native token vending service that issues short-lived, scoped GitHub tokens on demand via a GitHub App backend. Agents request tokens for specific repositories with specific permissions, and the broker returns a token that expires in one hour.
Instead of static credentials baked into pod environments, agents make authenticated requests to:
POST /v1/tokens/githubThe request specifies exactly what’s needed — which repositories and which permissions — and gets back a token scoped to precisely that. No more, no less.
Architecture
Section titled “Architecture”The token vending system has three components:
TokenBroker (Orchestrator)
Section titled “TokenBroker (Orchestrator)”The TokenBroker is the request handler for /v1/tokens/{service}. It receives token requests, validates them against the requesting agent’s configured capabilities, selects the appropriate backend, and returns the scoped token.
The broker doesn’t know how to generate GitHub tokens — it delegates to backends. This separation means adding support for other credential providers (GitLab, AWS STS, etc.) requires implementing a new backend, not modifying the broker.
TokenBackend Protocol
Section titled “TokenBackend Protocol”A simple interface that any credential backend must implement:
- Accept a set of requested repositories and permissions
- Validate that the request is fulfillable
- Generate and return a scoped, time-limited token
- Report the token’s TTL and actual granted permissions
The protocol is deliberately minimal. Backends handle their own authentication to the upstream provider and their own caching strategy.
GitHubAppBackend
Section titled “GitHubAppBackend”The first (and currently only) backend. It uses a GitHub App installation to generate scoped installation tokens:
- JWT Generation — The backend signs a JWT using the GitHub App’s private key (stored in Vault, injected via ESO). The JWT identifies the app and is valid for 10 minutes.
- Installation Token Exchange — The JWT is exchanged via GitHub’s API for an installation access token scoped to the requested repositories and permissions.
- Caching — Generated tokens are cached by their scope signature (sorted repos + permissions hash). Subsequent requests with the same scope return the cached token if it has sufficient remaining TTL (>5 minutes).
How It Works
Section titled “How It Works”A typical token request flow:
1. Agent requests a token:
POST /v1/tokens/github{ "repositories": ["spencer2211/spencerfuller.dev"], "permissions": { "contents": "write" }}2. Broker validates capabilities:
The broker checks the requesting agent’s configuration. Each MCP server definition in the gateway config can declare tokenCapabilities that limit which repos and permissions that server (or agent) can request:
tokenCapabilities: github: repositories: ["spencer2211/spencerfuller.dev"] permissions: contents: write pull_requests: writeIf the request exceeds the configured capabilities — requesting a repo not in the allowlist, or a permission level higher than configured — the broker rejects the request entirely. There is no silent downgrade to a subset of permissions. This is a deliberate design choice: partial credential grants can lead to subtle bugs where an agent proceeds with insufficient permissions and fails halfway through an operation.
3. Backend generates the token: The GitHubAppBackend generates a JWT, calls GitHub’s installation token endpoint with the specific repos and permissions, and receives a scoped token.
4. Token returned to agent:
{ "token": "ghs_xxxxxxxxxxxx", "expires_at": "2026-02-16T23:14:00Z", "permissions": { "contents": "write" }, "repositories": ["spencer2211/spencerfuller.dev"]}The token is valid for approximately one hour. The agent uses it for its immediate operation and discards it. No storage, no persistence, no rotation concern.
Security Model
Section titled “Security Model”Principle of Least Privilege
Section titled “Principle of Least Privilege”Every token is scoped to exactly the repositories and permissions requested. Even though the GitHub App installation may have access to multiple repositories, the installation token endpoint allows scoping down to a subset. The broker enforces this server-side — an agent cannot request broader access than its tokenCapabilities allow, regardless of what the underlying GitHub App installation permits.
Capability Enforcement
Section titled “Capability Enforcement”Token capabilities are defined in the MCP Gateway configuration, not by agents themselves. An agent cannot self-declare its own permissions. The configuration is managed through the GitOps pipeline (Flux + Helm), meaning capability changes require a git commit, a PR review, and a Flux reconciliation — the same change control process as any infrastructure modification.
Server-Side Repo Scoping
Section titled “Server-Side Repo Scoping”The GitHub App installation may cover an organization or multiple repositories. The broker explicitly scopes each installation token to only the requested repositories. This is defense in depth — even if an agent somehow bypassed the capability check, the token itself is scoped at the GitHub API level.
Caching with TTL Awareness
Section titled “Caching with TTL Awareness”Tokens are cached by their scope signature to avoid redundant API calls. The cache respects TTL — a cached token is only returned if it has more than 5 minutes of remaining validity. This prevents handing out tokens that are about to expire, which would cause operations to fail midway.
No Persistent Credentials
Section titled “No Persistent Credentials”The only long-lived secret in the system is the GitHub App private key, stored in Vault and injected via ESO. This key never leaves the gateway pod. Everything else — JWTs, installation tokens, cached tokens — is ephemeral and expires automatically.
Design Decisions
Section titled “Design Decisions”Why GitHub App Over PATs
Section titled “Why GitHub App Over PATs”GitHub Apps are the platform’s intended mechanism for programmatic access. They provide:
- Per-request scoping — installation tokens can be limited to specific repos and permissions
- Automatic expiration — tokens expire in 1 hour, no manual rotation needed
- Audit trail — GitHub logs all API activity by the App, separate from user activity
- No user account dependency — the App exists independently of any user’s account
PATs are fundamentally user-scoped. They inherit the user’s permissions and cannot be narrowed per-request. A PAT that can write to one repo can write to all repos the user owns. This is architecturally incompatible with least-privilege credential management.
Why Gateway-Native vs Standalone Service
Section titled “Why Gateway-Native vs Standalone Service”The token broker runs inside the MCP Gateway process rather than as a separate microservice. This decision reflects the deployment context:
- Shared authentication — the gateway already authenticates agent requests. A standalone service would need its own auth layer or would need to trust the gateway’s forwarded identity.
- Configuration co-location — token capabilities are part of the MCP server configuration. Keeping the broker in the gateway means one config file, one deployment, one reconciliation cycle.
- Resource efficiency — on ARM64 SBCs with 16GB RAM per node, every additional pod costs memory. The broker adds ~10MB to the gateway’s footprint instead of requiring its own pod, service, and network policy.
Why Reject-on-Insufficient (No Silent Downgrade)
Section titled “Why Reject-on-Insufficient (No Silent Downgrade)”When an agent requests permissions that exceed its capabilities, the broker returns an error rather than silently granting a subset. This seems strict, but the alternative is worse:
- An agent requests
contents: writebut only hascontents: readcapability - A silent downgrade grants a read-only token
- The agent proceeds to attempt a write operation
- The write fails with a 403 from GitHub
- The agent has to handle this failure case anyway
By rejecting upfront, the failure is immediate, clear, and actionable. The agent knows its configuration is wrong before it starts any work. This follows the fail-fast principle — surface errors at the earliest possible point.
What It Replaced
Section titled “What It Replaced”Before: Static PATs
Section titled “Before: Static PATs”Agent Pod├── $GITHUB_PORTFOLIO_TOKEN (PAT, write, all repos, never expires)├── $GITHUB_TOKEN (PAT, read, all repos, never expires)└── mcp-github-search server └── own PAT (read, all repos, never expires)Three tokens, all long-lived, all broader than necessary, all requiring manual rotation.
After: Dynamic Token Vending
Section titled “After: Dynamic Token Vending”Agent Pod└── POST /v1/tokens/github └── Returns: scoped token (1 repo, specific permissions, 1hr TTL)Zero long-lived GitHub tokens on agent pods. The only persistent secret is the GitHub App private key in Vault, accessible only to the MCP Gateway.
Current State
Section titled “Current State”The credential management strategy has shipped in two phases, with both phases live and operational.
Phase 1: Gateway Credential Injection (Live)
Section titled “Phase 1: Gateway Credential Injection (Live)”The MCP Gateway already acts as an authenticated proxy for backend services. Credentials stored in Vault are injected into outbound requests by the gateway — agents never see the raw tokens:
- Atlassian — Jira and Confluence API calls are authenticated by the gateway using OAuth credentials from Vault. Agents call
atlassian_restor MCP Atlassian tools; the gateway injects theAuthorizationheader. - Home Assistant — REST API calls to HA are proxied through the gateway’s
homeassistant_resttool with a long-lived access token injected from Vault. - GitHub (MCP tools) — The gateway’s GitHub MCP server uses a Vault-managed token for read operations (code search, file retrieval, repository listing).
This eliminated the mcp-github-search server’s standalone PAT and centralized credential management in one place. Two PATs remained on agent pods — $GITHUB_PORTFOLIO_TOKEN (write access for portfolio pushes) and $GITHUB_TOKEN (read-only for infrastructure repo) — and were the targets for Phase 2.
Phase 2: Token Vending via GitHub App (Live)
Section titled “Phase 2: Token Vending via GitHub App (Live)”The full token vending architecture described above — POST /v1/tokens/github, per-request scoping, 1-hour TTL, capability enforcement — is live and operational. The GitHub App has been created, its private key stored in Vault, and installations configured per repository.
Phase 2 has eliminated the remaining PATs from agent pods, completing the transition from static credentials to fully dynamic token issuance.
Delivered Impact
Section titled “Delivered Impact”With both phases live and operational:
- Eliminated all long-lived PATs from agent pods —
$GITHUB_PORTFOLIO_TOKEN,$GITHUB_TOKEN, and the already-decommissionedmcp-github-searchPAT - Per-request scoping — each token limited to exactly the repositories and permissions needed for the current operation
- Automated rotation — tokens expire in 1 hour with no manual intervention. The concept of “rotation” has disappeared now that credentials are ephemeral.
- Simplified security model — from “manage N tokens with different scopes and rotation schedules” to “one GitHub App key in Vault, everything else is dynamic”
Technology Stack
Section titled “Technology Stack”- Token Broker: Gateway-native (MCP Gateway, Node.js)
- Backend: GitHub App (RS256 JWT + installation token API)
- Key Management: HashiCorp Vault + External Secrets Operator
- Caching: In-memory with TTL-aware eviction
- Configuration: Helm values via GitOps (Flux)
- Runtime: Kubernetes on ARM64 (Orange Pi 5)