ADR-004: MCP over HTTP Transport vs stdio

Status: Accepted Date: 2025-10-10 Author: Spencer Fuller

Context

The OpenClaw platform runs multiple AI agents that need access to shared tooling — web search, academic paper retrieval, financial data, graph memory, GitHub search, and more. The Model Context Protocol (MCP) defines how agents discover and invoke tools, but MCP’s default transport is stdio (standard input/output): one agent process spawns one tool server process, communicating over stdin/stdout pipes.

The cluster runs on 4 Orange Pi 5 nodes (ARM64, Kubernetes v1.28.2). Tool servers need to be deployed as Kubernetes services — independently scalable, health-checked, and accessible to any agent in the cluster.

Key requirements:

Multi-agent access — multiple agents (serving different principals) must call the same tool servers concurrently without spawning duplicate processes
Kubernetes-native deployment — tool servers should run as pods/services with standard K8s lifecycle management (readiness probes, rolling updates, resource limits)
Decoupled lifecycles — an agent restart shouldn’t kill its tool servers, and a tool server crash shouldn’t take down the agent
Multiplexing — multiple concurrent tool calls to the same server from different agents or sessions

Decision

Deploy the MCP Gateway as an HTTP-based proxy (http://mcp-gateway.mcp-tools.svc.cluster.local:8080) that exposes a stateless REST API for tool discovery and invocation. Backend MCP servers run as separate pods in the mcp-tools namespace, each communicating with the gateway over HTTP. Agents call tools via the gateway’s /v1/tools/call endpoint rather than spawning stdio-connected tool processes.

Rationale

Multi-agent access to shared tools. With stdio transport, each agent spawns its own tool server process — if 3 agents need web search, that’s 3 separate websearch processes. With HTTP transport, a single websearch pod serves all agents concurrently. The gateway multiplexes requests from any agent to any backend server. This matters on resource-constrained SBCs where duplicate processes waste memory.
Kubernetes-native lifecycle management. HTTP-based tool servers run as standard Kubernetes Deployments with Services, readiness probes, resource requests/limits, and horizontal scaling. The gateway itself runs in the mcp-tools namespace alongside its backends. This is how Kubernetes is designed to work — services talking to services over the cluster network. stdio processes inside agent pods bypass all of this.
Decoupled failure domains. When an agent restarts (pod reschedule, OOM kill, upgrade), stdio-connected tool servers die with it — they’re child processes. With HTTP transport, tool servers are independent pods. An agent crash doesn’t affect tool availability for other agents. A tool server crash doesn’t take down the agent — it gets a 503 and retries or falls back.
Concurrent request multiplexing. stdio is inherently serial — one request at a time per pipe. HTTP supports concurrent requests natively. An agent can fire off parallel tool calls (search arxiv while fetching a webpage while querying financial data) without waiting for sequential responses. The gateway handles routing and timeout management.
Observability and health checking. The gateway exposes a /health endpoint that reports per-backend health and latency. Kubernetes liveness/readiness probes monitor each tool server independently. With stdio, health checking means “is the process still running?” — binary and uninformative.

Alternatives Considered

Alternative	Why Not
stdio (MCP default)	The spec’s default transport. Works well for single-agent, single-machine setups (e.g., Claude Desktop spawning a local tool). Falls apart in multi-agent Kubernetes environments: 1:1 coupling between agent and tool server, no multiplexing, no independent lifecycle management, no health checking. Every agent spawns its own copy of every tool server.
gRPC	Strong typing via protobuf, excellent streaming support, and built-in multiplexing. However, it’s heavy — requires `.proto` schema compilation, generated client/server code, and adds complexity that isn’t justified for tool calls that are essentially “call this function with these JSON arguments.” The MCP ecosystem is JSON-native; adding protobuf serialization creates a translation layer.
Custom REST API (no MCP)	Could design a bespoke REST API for tool invocation. But this means inventing tool discovery, argument schemas, error handling, and timeout semantics from scratch. MCP already defines all of this. Building a custom API just to avoid MCP’s transport limitation is classic NIH (Not Invented Here) — high maintenance burden for zero ecosystem benefit.

Consequences

Positive

15+ tool servers (fetch, arxiv, websearch, github, graphiti, foundry_finance, etc.) run as shared cluster services — any agent can call any tool without spawning dedicated processes
Tool servers scale independently: high-demand tools (websearch) can get more replicas while low-demand tools (patents) run as singletons
Adding a new tool server means deploying a pod and registering it with the gateway — agents discover it automatically via /v1/tools
Gateway provides centralized request logging, error code standardization, and timeout management across all tool servers

Negative

Network hop adds latency. Every tool call traverses agent → gateway → backend → gateway → agent instead of agent → stdio pipe. On a local cluster network, this adds 1-5ms per call — negligible for tool calls that typically take 100ms-10s, but it’s measurable overhead
Gateway is a single point of failure. If the gateway pod crashes, all tool access stops until it restarts. Mitigated by running multiple replicas and Kubernetes restart policies, but it’s a centralization risk that stdio doesn’t have
Divergence from MCP spec’s default transport. Most MCP client libraries assume stdio. Using HTTP transport means the gateway must implement the MCP protocol translation layer, and agents use REST calls rather than standard MCP client SDKs. This is additional code to maintain