How to Deploy an A2A Agent to Production

Building an A2A agent locally is the easy part. Getting it into production with proper infrastructure is where most teams stall. This covers the full path: containerization, process management, cloud deployment, HTTPS, health checks, monitoring, and the Agent Card URL problem.

Application server

Don't run uvicorn directly in production. Use gunicorn with uvicorn workers:

pip install gunicorn uvicorn[standard]

gunicorn agent:app \
  -w 4 \
  -k uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  --timeout 120 \
  --graceful-timeout 30 \
  --access-logfile - \
  --error-logfile -

-w 4 -- 4 workers. Start with 2 * CPU_CORES + 1.
-k uvicorn.workers.UvicornWorker -- async workers for SSE streaming.
--timeout 120 -- long enough for LLM calls and tool execution.
--graceful-timeout 30 -- time for in-flight requests to complete on shutdown.

Docker

FROM python:3.12-slim AS base
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

RUN useradd --create-home appuser
USER appuser

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD python -c "import httpx; httpx.get('http://localhost:8000/health').raise_for_status()"

CMD ["gunicorn", "agent:app", \
     "-w", "4", \
     "-k", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8000", \
     "--timeout", "120"]

docker build -t my-a2a-agent:latest .
docker run -p 8000:8000 -e GOOGLE_API_KEY="${GOOGLE_API_KEY}" my-a2a-agent:latest

Docker Compose

services:
  agent:
    build: .
    ports:
      - "8000:8000"
    environment:
      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
      - AGENT_URL=http://localhost:8000
      - LOG_LEVEL=info
    restart: unless-stopped

Environment configuration

Never hardcode URLs or secrets:

# config.py
import os

class Config:
    AGENT_NAME = os.getenv("AGENT_NAME", "My Agent")
    AGENT_VERSION = os.getenv("AGENT_VERSION", "1.0.0")
    AGENT_URL = os.getenv("AGENT_URL", "http://localhost:8000")
    PORT = int(os.getenv("PORT", "8000"))
    WORKERS = int(os.getenv("WORKERS", "4"))
    GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY", "")
    LOG_LEVEL = os.getenv("LOG_LEVEL", "info")

Use AGENT_URL in your Agent Card:

agent_card = AgentCard(
    name=Config.AGENT_NAME,
    url=Config.AGENT_URL,  # Must match the public URL
    version=Config.AGENT_VERSION,
    # ...
)

Health checks

Add a dedicated health endpoint -- don't rely on the Agent Card endpoint alone:

from fastapi import FastAPI
from fastapi.responses import JSONResponse

@app.get("/health")
async def health():
    checks = {"status": "healthy", "checks": {}}
    try:
        checks["checks"]["llm"] = "ok"  # Verify LLM connectivity
    except Exception as e:
        checks["checks"]["llm"] = f"error: {e}"
        checks["status"] = "degraded"
    status_code = 200 if checks["status"] != "unhealthy" else 503
    return JSONResponse(content=checks, status_code=status_code)

@app.get("/ready")
async def ready():
    return {"ready": True}

Deploy to Google Cloud Run

Cloud Run handles HTTPS, scaling, and container orchestration:

gcloud auth configure-docker
docker build -t gcr.io/YOUR_PROJECT/my-a2a-agent:latest .
docker push gcr.io/YOUR_PROJECT/my-a2a-agent:latest

gcloud run deploy my-a2a-agent \
  --image gcr.io/YOUR_PROJECT/my-a2a-agent:latest \
  --platform managed \
  --region us-central1 \
  --port 8000 \
  --memory 1Gi \
  --cpu 2 \
  --timeout 300 \
  --concurrency 80 \
  --min-instances 1 \
  --max-instances 10 \
  --set-env-vars "AGENT_URL=https://my-a2a-agent-HASH.run.app,LOG_LEVEL=info" \
  --set-secrets "GOOGLE_API_KEY=google-api-key:latest" \
  --allow-unauthenticated

--timeout 300 -- 5 minutes for long-running tasks.
--min-instances 1 -- avoid cold starts. Set to 0 if you can tolerate them.
--allow-unauthenticated -- Agent Card must be public. Implement auth at the app level (OAuth2 guide).

Cloud Run supports SSE streaming natively. For tasks longer than a few minutes, use push notifications instead.

TLS / HTTPS

If you're not on a managed platform, you need TLS. Agent Cards with http:// URLs signal an insecure deployment.

Caddy (auto Let's Encrypt)

# Caddyfile
my-agent.example.com {
    reverse_proxy localhost:8000
    header Strict-Transport-Security "max-age=31536000; includeSubDomains"
}

nginx

server {
    listen 443 ssl http2;
    server_name my-agent.example.com;

    ssl_certificate /etc/letsencrypt/live/my-agent.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/my-agent.example.com/privkey.pem;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Critical for SSE streaming
        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 300s;
    }
}

The proxy_buffering off directive is critical. Without it, nginx buffers the response and clients won't see SSE events until the stream closes.

Agent Card URL management

The url field must match the public URL where clients send requests. This causes real problems across environments:

Local dev: http://localhost:8000
Docker: http://host.docker.internal:8000
Staging: https://staging-agent.example.com
Production: https://agent.example.com

Use environment variables. For agents behind a load balancer, url should be the external-facing URL, not the internal one.

Custom domain on Cloud Run

gcloud run domain-mappings create \
  --service my-a2a-agent \
  --domain agent.example.com \
  --region us-central1

Then set AGENT_URL=https://agent.example.com.

Monitoring

Structured logging

import logging, json, sys

class JSONFormatter(logging.Formatter):
    def format(self, record):
        return json.dumps({
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "message": record.getMessage(),
            "module": record.module,
        })

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JSONFormatter())
logger = logging.getLogger("a2a-agent")
logger.addHandler(handler)
logger.setLevel(logging.INFO)

What to monitor

Task completion rate -- percentage of completed vs failed tasks
Task duration -- p50, p95, p99 latency for message/send
Active SSE connections -- streaming connection count and duration
Error rate -- auth failures, invalid input, LLM errors
Agent Card requests -- spikes indicate scanning or new integrations
LLM latency -- track separately if your agent calls an LLM

Production checklist

Next steps

A2A Protocol Tutorial -- if you haven't built an agent yet
Secure A2A Agents with OAuth2 -- authentication and authorization
Agent Card JSON Schema Reference -- get your card right before deploying
Browse production-ready stacks and agents for reference architectures