Stretch Goal · 30 min

Run the Server Over HTTP

Switch from stdio to Streamable HTTP. Experience both transports and answer "when do I pick which?" with conviction.

⏱ 30 minutes 🌐 HTTP transport 🔐 Auth pattern
Prerequisite

You've completed the main MCP build tutorial. You should have a working server.py in your compliance-mcp directory.

Why HTTP exists

Stdio is great for local, single-user, host-spawns-server scenarios. It's not great for:

  • Multi-user — many clients connecting to one shared server
  • Multi-host — one team's tools, many laptops
  • Centralized state — the server holds session data, caches, locks
  • Production observability — standard HTTP infra: load balancing, mTLS, request logs
  • Cross-language — clients in any language can speak HTTP

MCP's Streamable HTTP transport is a single endpoint that handles both regular requests (JSON-RPC over HTTP POST) and streaming (Server-Sent Events for long-running operations). It replaces the older SSE-based pattern with a simpler one-endpoint design.

stdioHTTP
Who runs itHost spawns as child processLong-lived service, separately managed
AuthProcess boundaryBring your own (JWT, OAuth, mTLS)
Multi-userNo — one process per userYes — N clients share one server
ObservabilityProcess logs, manualStandard HTTP infra
Network surfaceZeroOne port — must be secured
Fit forDeveloper tools, single-user agentsProduction compliance services

1Run as HTTP server~5 min

FastMCP supports both transports with a single line change. First, flip the constructor at the top of server.py to enable stateless HTTP — this lets each request stand on its own without an initialize handshake and a session ID. It's the simpler mode for pure tool servers and is what makes the single-curl tests in step 4 work:

server.py — top
# Was: mcp = FastMCP("compliance-toolkit")
mcp = FastMCP("compliance-toolkit", stateless_http=True)

Then add a transport switch at the bottom of the file so the same script runs as either stdio or HTTP:

server.py — bottom of file
if __name__ == "__main__":
    import sys
    transport = sys.argv[1] if len(sys.argv) > 1 else "stdio"
    if transport == "http":
        mcp.run(transport="streamable-http")
    else:
        mcp.run()  # stdio (default)
Stateful vs stateless HTTP

With the default (stateful) Streamable HTTP, every request after the first must include an Mcp-Session-Id header that the server returns from initialize. Skip the handshake and the server replies Bad Request: Missing session ID. Stateless mode (stateless_http=True) drops that requirement — each request is independent. When to use which: stateful is right when the server holds per-session state (auth tokens scoped to a session, in-flight subscriptions, long-running operations). Stateless is right for pure tool servers where every call is idempotent and self-describing — most compliance/scoring/screening tools fit this profile, and it's much easier to scale horizontally.

Now you can launch the same server in either mode:

shell (venv active)
# stdio mode (what Claude Desktop uses):
python server.py

# HTTP mode (long-lived service):
python server.py http

Run the HTTP mode. By default FastMCP serves on http://localhost:8000/mcp. You should see startup logs in your terminal.

Configuring host and port

For production you'd configure these explicitly via environment variables or FastMCP options. For learning, the defaults are fine.

2Test with the inspector~5 min

Open a second terminal (the first is running the server). Point the MCP inspector at your HTTP endpoint:

shell (second terminal)
npx @modelcontextprotocol/inspector

In the inspector UI:

  1. Set Transport to Streamable HTTP
  2. Set URL to http://localhost:8000/mcp
  3. Click Connect

You should see the same three tools, two resources, and (if you did stretch 04b) two prompts as before. Invoke summarize_alert with the standard payload — it works identically.

What just happened

The server is the same code. The protocol is the same JSON-RPC. The only difference is transport: instead of bytes flowing through pipes between a parent and child process, they're flowing through TCP between two separate processes — potentially on different machines. That's the whole conceptual difference between stdio and HTTP. Everything else is plumbing.

3Add JWT auth (the production-ready piece)~15 min

An HTTP MCP server with no auth is a wide-open door to your compliance tools. Production deployments need auth. Let's add a minimal JWT-bearer pattern.

Install PyJWT:

shell
pip install pyjwt

Create a separate file mint_token.py for issuing test tokens:

mint_token.py
"""Mint a JWT for testing the compliance-toolkit HTTP server.

In production this lives in a server-side mint endpoint with proper user auth,
short TTLs, and scoping. For learning, we'll mint a 1-hour token for a fake user.
"""

import jwt
from datetime import datetime, timedelta, timezone

SECRET = "dev-secret-do-not-use-in-production"  # in prod: env var, KMS, etc.

def mint(user_id: str, allowed_tools: list[str], ttl_minutes: int = 60) -> str:
    now = datetime.now(timezone.utc)
    payload = {
        "sub": user_id,
        "iat": int(now.timestamp()),
        "exp": int((now + timedelta(minutes=ttl_minutes)).timestamp()),
        "scope": {"tools": allowed_tools},
        "purpose": "compliance-toolkit-access",
    }
    return jwt.encode(payload, SECRET, algorithm="HS256")


if __name__ == "__main__":
    # Issue a token that allows all tools
    token = mint(
        user_id="analyst.demo@example.com",
        allowed_tools=["lookup_sanctions_hit", "check_jurisdiction_risk", "summarize_alert"],
    )
    print(token)

Run it once to get a token:

shell
python mint_token.py
# eyJhbGciOiJIUzI1NiIs... (copy this)

Now add an HTTP middleware-style check inside your tools. The cleanest pattern with FastMCP is a small auth decorator:

server.py — top, after imports
import jwt
from contextvars import ContextVar
from functools import wraps
from mcp.server.fastmcp import Context

SECRET = "dev-secret-do-not-use-in-production"

# Threaded through the request via ContextVar so tool signatures stay clean.
# FastMCP rejects tool parameters that start with '_', so we can't pass the
# auth identity as a function kwarg. Tools read it via current_auth_user().
_current_auth_user: ContextVar[str] = ContextVar("_current_auth_user", default="anonymous")


def current_auth_user() -> str:
    """Return the authenticated subject ('sub' claim) for the in-flight tool call."""
    return _current_auth_user.get()


def require_auth(tool_name: str):
    """Decorator: validate a Bearer JWT from the request headers and check scope."""
    def deco(fn):
        @wraps(fn)
        def wrapper(*args, ctx: Context = None, **kwargs):
            # FastMCP makes the request available via ctx.request_context
            headers = {}
            if ctx and hasattr(ctx, "request_context"):
                req = ctx.request_context.request
                if req:
                    headers = dict(req.headers)

            auth = headers.get("authorization", "")
            if not auth.startswith("Bearer "):
                return {"error": "missing or malformed Authorization header", "isError": True}

            token = auth.removeprefix("Bearer ").strip()
            try:
                payload = jwt.decode(token, SECRET, algorithms=["HS256"])
            except jwt.ExpiredSignatureError:
                return {"error": "token expired", "isError": True}
            except jwt.InvalidTokenError as e:
                return {"error": f"invalid token: {e}", "isError": True}

            scope_tools = payload.get("scope", {}).get("tools", [])
            if tool_name not in scope_tools:
                return {
                    "error": f"token does not authorize tool '{tool_name}'",
                    "authorized": scope_tools,
                    "isError": True,
                }

            token_ref = _current_auth_user.set(payload["sub"])
            try:
                return fn(*args, **kwargs)
            finally:
                _current_auth_user.reset(token_ref)
        return wrapper
    return deco

Apply it to one tool to see the pattern. This is a two-step refactor of the lookup_sanctions_hit you already have: (1) rename the existing function — body unchanged — to a private helper _do_screening; (2) create a new public lookup_sanctions_hit that wraps the helper with @require_auth and stamps the caller's identity onto the response for the audit trail.

server.py — step 1: rename existing function to a private helper
# Was: @mcp.tool()
# Was: def lookup_sanctions_hit(name: str) -> dict:
def _do_screening(name: str) -> dict:
    """Existing screening logic — cache, OpenSanctions API call, fallback.
    No changes to the body; just renamed and the @mcp.tool() decorator removed.
    """
    cached = _cache_get(name)
    if cached is not None:
        return {**cached, "from_cache": True}
    # ... rest of your existing cache + API + fallback logic unchanged ...
    # (the body you wrote in the real-API stretch)
server.py — step 2: new wrapped tool
@mcp.tool()
@require_auth("lookup_sanctions_hit")
def lookup_sanctions_hit(name: str, ctx: Context = None) -> dict:
    """Screen a person or entity name against the live OpenSanctions database.

    Requires a Bearer JWT in the Authorization header with 'lookup_sanctions_hit'
    in its tool scope. The caller's identity is recorded on every response.
    """
    result = _do_screening(name)
    result["screened_by"] = current_auth_user()  # audit trail!
    return result
Heads-up: three traps to know

1. The tool function must declare ctx: Context = None. FastMCP introspects the inner function's signature (via functools.wraps) to decide which framework-injected parameters to pass. If ctx isn't in the signature, FastMCP won't inject one, the wrapper's headers dict stays empty, and every call fails with "missing or malformed Authorization header" — even when you sent a valid token. Context-typed parameters are auto-injected by the framework and excluded from the JSON schema the model sees, so the model can't pass anything for ctx.

2. Don't pass auth identity via a kwarg. FastMCP rejects tool parameters whose name starts with _ at registration time (InvalidSignature: Parameter _auth_user of lookup_sanctions_hit cannot start with '_'). It also rejects passing it as a regular kwarg like auth_user because then the model sees it as a tool argument it can supply, which lets it impersonate any user. The ContextVar pattern above keeps the tool's public signature clean while still threading identity through the request.

3. Decorator order matters. @mcp.tool() must sit above @require_auth(...), so FastMCP registers the auth-wrapped callable rather than the raw one.

Test it with curl from the second terminal. The Accept header is required — MCP's Streamable HTTP transport rejects requests that don't advertise support for both JSON and SSE responses (Not Acceptable error otherwise):

shell
# Replace TOKEN with the JWT you minted
TOKEN="eyJhbGc..."

# Should succeed with that token's scope:
curl -X POST http://localhost:8000/mcp \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"lookup_sanctions_hit","arguments":{"name":"Vladimir Petrov"}}}'

# Should fail with no token (401):
curl -X POST http://localhost:8000/mcp \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"lookup_sanctions_hit","arguments":{"name":"Vladimir Petrov"}}}'
The pattern, regardless of FastMCP version specifics

The key idea: JWT validation happens before the tool runs. The token carries the user identity (which goes into the audit log) and the scope (which tools this token authorizes). A short TTL means a leaked token's blast radius is bounded. A rotation path on the mint side means revoking the long-lived credential invalidates derived tokens.

The SDK details for accessing request headers in FastMCP evolve quickly — if the above doesn't fit your installed version, check the official docs. The pattern (verify-before-execute, scope-per-token, user-in-audit) is universal.

4Now you can answer "stdio vs HTTP" with conviction~2 min

You've run both transports. The pros and cons are no longer abstract:

Pick stdio when

The user is on their laptop running Claude Desktop. There's no shared state. There's no need for separate auth — the OS user identity is the identity. Operations are local to the user (their files, their workspace, their session). Examples: developer tools, single-user productivity agents.

Pick HTTP when

Multiple users share the server. The server hits production systems (compliance databases, case management, vendor APIs). You need centralized observability, rate limiting, secrets management. You need to enforce auth, scope, and audit. Examples: every production compliance MCP server.

What you can now say in the interview

Sample answer — ~75 seconds

"I've run the same MCP server in both transports — stdio for local single-user, HTTP for production-ready. The decision is straightforward once you've done it. Stdio is right when the host can spawn the server as a child process and the OS user identity is the auth — developer tools, single-user agents. HTTP is right whenever you need multiple clients, centralized state, real observability, or non-OS-level auth — every production compliance use case. The piece that matters most is auth: HTTP needs a real story, and I wired up the JWT pattern — short-lived bearer token, scope-per-token specifying which tools it authorizes, user identity in every audit log entry, and a mint endpoint that holds the long-lived credential so it never leaves the server. The blast radius of a leaked HTTP MCP token is bounded by the scope and the TTL. That's the difference between 'we exposed a tool over HTTP' and 'we exposed a tool over HTTP in a way Compliance would approve.'"