Company Logo
MCIP

Build Server

Transform MCIP from a local experiment into a production-ready system in under an hour. This guide covers configuration, optimization, monitoring, and deployment strategies that work.

The Journey from Development to Production

Getting MCIP running locally is straightforward—you've probably already done it. Taking it to production is a different adventure entirely.

Think of it like cooking: following a recipe at home is one thing, running a restaurant kitchen is another. You need consistency, reliability, and the ability to handle the dinner rush without breaking a sweat.

This guide walks you through that transformation. We'll cover everything from configuration to deployment, with real examples and honest advice about what actually matters. No fluff, no unnecessary complexity—just what you need to go live with confidence.


Part 1: Production Configuration

Understanding Configuration Layers

MCIP uses a layered configuration approach. Think of it like dressing for unpredictable weather—base layer, insulation, outer shell. Each layer serves a purpose:

  1. Base Configuration: Default settings that work everywhere
  2. Environment Variables: Runtime secrets and environment-specific values
  3. Configuration Files: Detailed settings for adapters and behavior
  4. Runtime Overrides: Dynamic adjustments without restarts

Essential Environment Variables

Here's your production environment template. Every variable earns its place:

# .env.production
# Core Application Settings
NODE_ENV=production
PORT=8000
LOG_LEVEL=info

# AI Services - The brain of semantic search
OPENAI_API_KEY=sk-your-production-key
OPENAI_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536    # Default for text-embedding-3-small
EMBEDDING_TIMEOUT_MS=3000

# Vector Database - Where meaning lives (Qdrant)
QDRANT_URL=http://qdrant:6333
QDRANT_TIMEOUT_MS=5000
QDRANT_COLLECTION=products

# Session Storage - Redis keeps conversations alive
REDIS_URL=redis://redis-primary:6379
REDIS_PASSWORD=your-secure-password
SESSION_TTL_HOURS=24

# Performance Tuning
SEARCH_TIMEOUT_MS=2500
MAX_CONCURRENT_SEARCHES=10
RESULT_CACHE_TTL_SECONDS=300
EMBEDDING_CACHE_TTL_SECONDS=3600

# Security
CORS_ORIGINS=

Configuration Best Practices

Never commit secrets to version control. This sounds obvious, but it happens more than you'd think. Use a secrets manager like AWS Secrets Manager, HashiCorp Vault, or at minimum, environment-specific .env files that are gitignored.

Use different API keys per environment. Your production OpenAI key should be separate from development. This prevents accidental quota exhaustion and provides clearer billing.

Set reasonable timeouts. The defaults assume perfect conditions. In production, networks hiccup, services lag, and users get impatient.

ServiceDevelopmentProductionWhy
Embedding API5000ms3000msFail fast, don't block
Vector Search10000ms5000msUsers won't wait longer
Total Search5000ms2500msAggregate timeout
Redis1000ms500msShould be nearly instant

Part 2: Performance Optimization

Where Time Actually Goes

Before optimizing anything, understand where your search latency comes from. Here's a typical breakdown:

Total Search: 450ms ├── Query Processing: 15ms (3%) ├── Embedding Generation: 150ms (33%) ├── Vector Search: 250ms (56%) ├── Result Enrichment: 25ms (6%) └── Response Formatting: 10ms (2%)

The insight? Embedding generation and vector search dominate. That's where optimization efforts pay off.

Embedding Cache: Your Biggest Win

Many searches are variations of common queries. "Gaming laptop," "laptop for gaming," "gaming notebook"—they're semantically similar. Caching embeddings for frequent queries dramatically reduces latency for repeat searches.

// config/cache.config.ts
export const cacheConfig = {
  embedding: {
    enabled: true,
    ttl: 3600,              // 1 hour
    maxSize: 10000,         // Store up to 10k unique query embeddings
    strategy: 'lru',        // Least Recently Used eviction
    keyNormalizer: (query: string) => {
      return query
        .toLowerCase()
        .trim()
        .replace(/\s+/g, ' ')
        .substring(0, 200);
    }
  },
  results: {
    enabled: true,
    ttl: 300,               // 5 minutes
    maxSize: 5000
  }
};

Expected impact: 40-60% latency reduction for cached queries, which often represent 30-50% of total traffic.

Connection Pooling

Opening new connections for every request is expensive. Pool them:

// config/connections.config.ts
export const connectionConfig = {
  redis: {
    maxConnections: 50,
    minConnections: 10,
    acquireTimeout: 1000,
    idleTimeout: 30000
  },
  http: {
    maxSockets: 100,
    maxFreeSockets: 20,
    keepAlive: true,
    keepAliveMsecs: 30000
  }
};

Memory Management

Node.js applications can be memory-hungry. In production, set explicit limits:

# docker-compose.production.yml
services:
  mcip:
    image: mcip:latest
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    environment:
      - NODE_OPTIONS=--max-old-space-size=3584

Pro tip: Set max-old-space-size to about 90% of your container memory limit. This gives Node.js room to breathe during garbage collection.


Part 3: Monitoring Setup

Why Monitoring Matters

Here's a truth about production systems: if you're not watching, you're guessing. Monitoring transforms "the system feels slow" into "embedding latency increased 40% after the last deployment."

Key Metrics to Track

Think of metrics in three categories:

Health Metrics (Is it working?): Request success rate, Error rate by type, System uptime

Performance Metrics (How well?): Response time percentiles (P50, P95, P99), Throughput (requests per second), Queue depths

Business Metrics (Is it valuable?): Search relevance scores, Session duration

Prometheus Integration

MCIP exposes metrics at /metrics in Prometheus format:

// config/metrics.config.ts
export const metricsProviders = [
  {
    name: 'mcip_search_duration_seconds',
    help: 'Search request duration in seconds',
    labelNames: ['status', 'cache_hit'],
    buckets: [0.1, 0.25, 0.5, 0.75, 1, 2.5, 5]
  },
  {
    name: 'mcip_embedding_duration_seconds',
    help: 'Embedding generation duration',
    labelNames: ['model'],
    buckets: [0.05, 0.1, 0.15, 0.2, 0.3, 0.5]
  },
  {
    name: 'mcip_requests_total',
    help: 'Total number of requests',
    labelNames: ['method', 'status']
  }
];

Alerting Rules

# alerts/mcip-alerts.yml
groups:
  - name: mcip-critical
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(mcip_requests_total{status="error"}[5m])) 
          / sum(rate(mcip_requests_total[5m])) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "MCIP error rate above 5%"
      
      - alert: HighSearchLatency
        expr: |
          histogram_quantile(0.95, rate(mcip_search_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Search P95 latency above 1 second"

Part 4: Deployment Strategies

Strategy 1: Single Server (Simple Start)

Perfect for getting started or low-traffic deployments:

services:
  mcip:
    image: mcip:${VERSION:-latest}
    ports:
      - "8000:8000"
    environment:
      - NODE_ENV=production
    env_file:
      - .env.production
    depends_on:
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
    restart: unless-stopped

volumes:
  redis-data:

Best for: Development teams, proof-of-concept, <100 requests/minute

Strategy 2: Load-Balanced (Growth Mode)

Multiple MCIP instances behind a load balancer:

# docker-compose.balanced.yml
version: '3.8'

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - mcip1
      - mcip2
      - mcip3

  mcip1:
    image: mcip:${VERSION:-latest}
    env_file: .env.production
    depends_on:
      - redis

  mcip2:
    image: mcip:${VERSION:-latest}
    env_file: .env.production
    depends_on:
      - redis

  mcip3:
    image: mcip:${VERSION:-latest}
    env_file: .env.production
    depends_on:
      - redis

  redis:
    image: redis:7-alpine
    command: redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes
    volumes:
      - redis-data:/data

volumes:
  redis-data:

Best for: Production workloads, 100-1000 requests/minute

Strategy 3: Kubernetes (Scale Mode)

For serious scale and operational maturity:

# k8s/deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcip
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mcip
  template:
    metadata:
      labels:
        app: mcip
    spec:
      containers:
        - name: mcip
          image: mcip:latest
          ports:
            - containerPort: 8000
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2000m"
              memory: "4Gi"
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcip-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcip
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Best for: High-traffic production, 1000+ requests/minute


Part 5: Going Live Checklist

Security

  • All secrets stored securely (not in code)
  • HTTPS/TLS enabled for all endpoints
  • CORS configured for your domains only
  • Rate limiting enabled
  • API authentication configured

Performance

  • Caching enabled and tested
  • Timeouts configured appropriately
  • Connection pooling set up
  • Memory limits defined
  • Load tested with expected traffic

Reliability

  • Health checks configured
  • Graceful shutdown implemented
  • Logging at appropriate levels
  • Monitoring and alerting active
  • Backup and recovery tested

Operations

  • Deployment process documented
  • Rollback procedure tested
  • On-call rotation established
  • Runbooks for common issues created

Troubleshooting Production Issues

High Latency

Symptoms: P95 search times above 1 second

Common fixes: Increase embedding cache TTL, Add more MCIP instances, Review and optimize slow queries, Check network latency to external services

Memory Growth

Symptoms: Container memory steadily increasing, eventual OOM

Common fixes: Reduce session TTL, Lower cache max sizes, Implement cache eviction, Set explicit memory limits

Connection Errors

Symptoms: Intermittent failures to Redis, Qdrant, or OpenAI

Common fixes: Increase connection pool size, Implement retry with backoff, Distribute load across instances


What's Next?

Congratulations! You've transformed MCIP from a local experiment into a production-ready system. Here's where to go from here:

  • Create Custom Adapters: Connect additional e-commerce platforms
  • Store Registration: Add your stores to the system
  • Build AI Agent Integration: Create intelligent shopping assistants

Remember: production systems are living things. They need attention, care, and occasional adjustment.