Architecture

Architecture Overview

MCIP uses a three-layer architecture with NestJS at its core, RAG for semantic intelligence, and real-time adapters for platform integration, delivering sub-500ms search across any e-commerce platform.

System Context

MCIP operates as an intelligent middleware layer between AI agents and e-commerce platforms. Unlike traditional API gateways that simply route requests, MCIP adds semantic understanding, session management, and protocol translation.

External Dependencies

AI Services power our semantic understanding:

OpenAI API: Generates 512-dimensional embeddings using text-embedding-3-small. We chose this model for its optimal balance of speed (150ms) and accuracy for e-commerce contexts.
Pinecone: Stores and searches vectors with cosine similarity. Handles 1M+ vectors with <250ms query time at p95.

Infrastructure Services ensure reliability:

Redis: Manages sessions with 24-hour TTL, maintaining cart state and search history. Chosen for its atomic operations and proven stability at scale.
Docker: Containerizes everything for consistent deployment across environments.

Component Architecture

MCIP's internal architecture follows Domain-Driven Design principles with clear separation of concerns:

Layer Responsibilities

Presentation Layer handles protocol translation:

MCP Handler: Implements Model Context Protocol with Zod validation
REST API: Fallback for non-MCP clients
WebSocket: Real-time updates for cart changes

Application Layer orchestrates business logic:

Tool Service: Manages the five MCP tools (search, add_to_cart, view_cart, update_cart, clear_cart)
Search Orchestrator: Coordinates parallel searches with 2500ms timeout budget
Cart Manager: Handles complex cart operations with conflict resolution

Domain Layer contains core business logic:

Product Service: Normalizes products from different platforms
Session Service: Isolates machine customers with UUID-based sessions
Embedding Service: Manages the RAG pipeline for semantic understanding

Infrastructure Layer connects to external systems:

Platform Adapters: Translate MCIP requests to platform-specific APIs
External Clients: Handle authentication, retries, and rate limiting

Technology Stack

Core Framework: NestJS

We chose NestJS over Express or Fastify for several critical reasons:

Dependency Injection: Clean separation between layers
TypeScript First: Type safety across 40,000+ lines of code
Modular Architecture: Each adapter is a separate module
Built-in Testing: 87% test coverage achieved easily
Decorator Pattern: Perfect for MCP tool implementation

@MCPTool({
    name: 'search_product',
    description: 'Search products with semantic understanding',
    schema: SearchProductSchema
})
async searchProduct(params: SearchParams): Promise<ProductResult[]> {
    // Implementation leverages DI for all services
}

The RAG Pipeline

Our semantic search pipeline transforms queries through multiple stages:

Why 512 dimensions?

We tested 1536-dim (text-embedding-3-large) but found:

Only 2% accuracy improvement for e-commerce queries
3x slower embedding generation
4x higher memory usage in Pinecone
512-dim delivers 89% relevance at 150ms

Data Storage Strategy

No Product Database - This is crucial to understand:

We store embeddings (vectors) not products
Products are fetched real-time from source systems
Cache layer only stores session data and recent searches
This ensures data freshness and reduces infrastructure costs

Performance Characteristics

Real-World Metrics

Based on production monitoring across 10,000+ daily searches:

Operation	P50	P95	P99	Target
Embedding Generation	145ms	189ms	212ms	<200ms
Vector Search	238ms	287ms	342ms	<300ms
Single Store Fetch	180ms	450ms	890ms	<1000ms
Total Search (1 store)	421ms	498ms	587ms	<500ms
Total Search (3 stores)	1,243ms	2,187ms	2,876ms	<3000ms
Cart Operations	12ms	34ms	67ms	<100ms

Optimization Strategies

Parallel Processing is our secret weapon:

// All stores searched simultaneously
const results = await Promise.allSettled(
    stores.map(store => 
        Promise.race([
            store.search(query),
            timeout(1500) // Per-store timeout
        ])
    )
);
// Failed stores don't block others

Intelligent Caching reduces redundant work:

Embedding cache: Common queries cached for 1 hour
Result cache: Identical searches cached for 5 minutes
Session cache: Cart state persisted for 24 hours

Graceful Degradation ensures reliability:

If embeddings fail: Fallback to keyword search
If store times out: Return partial results
If vector DB is slow: Use cached results

Scalability Architecture

Horizontal Scaling Pattern

MCIP scales horizontally with stateless application servers:

Scaling Metrics

Current Capacity (single instance):

1,247 requests/second
100 concurrent sessions
10 stores searched in parallel

Scaled Capacity (3-node cluster):

3,500+ requests/second
1,000+ concurrent sessions
Network becomes bottleneck before CPU

Auto-scaling Triggers

We scale based on multiple metrics:

CPU Usage > 70% for 2 minutes → Add instance
Response Time P95 > 600ms → Add instance
Queue Depth > 100 pending requests → Add instance
Time-based → Pre-scale for predictable traffic

Architecture Decisions

Why Monolithic (For Now)

We started with a monolith instead of microservices because:

Faster iteration - Single deployment, unified logging
Lower complexity - No service discovery, no distributed tracing needed yet
Cost effective - One service to run instead of 10
Performance - No network hops between services

We're prepared to extract services when needed:

Search Service (when we hit 10K req/s)
Embedding Service (if we switch AI providers)
Cart Service (for persistent cart requirements)

Why Real-Time Over Cached

Every architecture decision has trade-offs. We chose real-time fetching over maintaining a product cache because:

Advantages:

Always accurate inventory
No sync delays
No storage costs
Simpler architecture

Trade-offs:

Higher latency per request
Dependency on store APIs
More complex error handling

For machine customers making purchase decisions, accuracy trumps speed.

Monitoring and Observability

Key Metrics We Track

Business Metrics:

Search relevance score (target: >85%)
Cart abandonment rate
Cross-store search percentage
Average products per search

Technical Metrics:

API latency (p50, p95, p99)
Embedding cache hit rate (target: >40%)
Store adapter success rate
Redis memory usage

Health Indicators:

GET /health
{
    "status": "healthy",
    "uptime": 425234,
    "memory": { "used": "1.2GB", "limit": "4GB" },
    "redis": "connected",
    "pinecone": "healthy",
    "adapters": {
        "vendure": "online",
        "shopify": "online",
        "woocommerce": "degraded"
    }
}

Security Considerations

While detailed security is covered elsewhere, the architecture implements defense in depth:

API Gateway - Rate limiting, DDoS protection
Application - Input validation, session isolation
Infrastructure - Secrets management, TLS everywhere
Data - No PII storage, encrypted sessions

What's Next

Three-Layer Design - Deep dive into each layer
Protocol-First Approach - Why protocol matters
Adapters - Building platform connectors
Performance Tuning - Optimization guide

Summary

MCIP's architecture balances simplicity with sophistication. We've built a system that's easy to deploy (single Docker container) yet powerful enough to handle the complexity of semantic search across heterogeneous e-commerce platforms.

The key insight: by focusing on protocol translation rather than data aggregation, we've created an architecture that scales with the growth of machine customers while maintaining the flexibility to evolve.

Menu

Get started