Company Logo
MCIP
Architecture

Architecture Overview

MCIP uses a three-layer architecture with NestJS at its core, RAG for semantic intelligence, and platform adapters for e-commerce integration. Products are synchronized via manual endpoint triggers and stored in Qdrant for fast semantic search.

System Context

MCIP operates as an intelligent middleware layer between AI agents and e-commerce platforms. Unlike traditional API gateways that simply route requests, MCIP adds semantic understanding, session management, and protocol translation.

External Dependencies

AI Services power our semantic understanding:

  • OpenAI API: Generates 1536-dimensional embeddings using text-embedding-3-small. We chose this model for its optimal balance of speed and accuracy for e-commerce contexts.
  • Qdrant: Stores and searches vectors with cosine similarity. An open-source vector database that can be self-hosted, providing fast similarity search with payload filtering.

Infrastructure Services ensure reliability:

  • Docker: Containerizes everything for consistent deployment across environments.
  • BullMQ + Redis: Manages product ingestion queue for reliable processing.

Component Architecture

MCIP's internal architecture follows Domain-Driven Design principles with clear separation of concerns:

Layer Responsibilities

Presentation Layer handles protocol translation:

  • MCP Handler: Implements Model Context Protocol with Zod validation
  • REST API: HTTP endpoints for search, admin operations, and ingestion

Application Layer orchestrates business logic:

  • Tool Service: Manages the MCP tool (search_product)
  • Search Service: Coordinates vector search with filter extraction
  • Ingestion Service: Manages product sync from e-commerce platforms

Domain Layer contains core business logic:

  • Product Service: Fetches and normalizes products from different platforms
  • Embedding Service: Manages the RAG pipeline for semantic understanding
  • Feature Extraction: AI-powered extraction of filters from natural language queries

Infrastructure Layer connects to external systems:

  • Product Mappers: Transform raw store data to unified product schema
  • External Clients: Handle authentication, retries, and rate limiting

Technology Stack

Core Framework: NestJS

We chose NestJS over Express or Fastify for several critical reasons:

  • Dependency Injection: Clean separation between layers
  • TypeScript First: Type safety across the codebase
  • Modular Architecture: Each adapter is a separate module
  • Built-in Testing: High test coverage achieved easily
  • Decorator Pattern: Perfect for MCP tool implementation
@MCPTool({
    name: 'search_product',
    description: 'Search products with semantic understanding',
    schema: SearchProductSchema
})
async searchProduct(params: SearchParams): Promise<ProductResult[]> {
    // Implementation leverages DI for all services
}

The RAG Pipeline

Our semantic search pipeline transforms queries through multiple stages:

Why 1536 dimensions?

We use text-embedding-3-small default dimensions (1536) as it provides:

  • Excellent accuracy for e-commerce queries
  • Good balance between quality and computational cost
  • Native support in Qdrant without dimension reduction

Data Storage Strategy

Vector Database for Products:

  • Products are synchronized to Qdrant via manual admin endpoint
  • Each product is embedded and stored with payload metadata
  • Hybrid search combines vector similarity with payload filters
  • This enables both semantic understanding and precise filtering (price, brand, etc.)

Product Synchronization

MCIP uses a manual synchronization model rather than real-time product fetching:

How Sync Works

  1. Admin Trigger: Call POST /admin/sync with admin API key
  2. Fetch Products: System fetches all products from configured source
  3. Queue Processing: Products are queued via BullMQ for processing
  4. Mapping: Each product is mapped to unified schema via adapters
  5. Embedding: Products are embedded using OpenAI
  6. Storage: Vectors and payloads stored in Qdrant

Why Manual Sync?

  • Simplicity: No complex real-time infrastructure needed
  • Control: You decide when to update the product catalog
  • Reliability: Batch processing is more reliable than real-time
  • Cost: Fewer API calls to embedding service

Performance Characteristics

Real-World Metrics

Based on testing with product catalogs:

OperationTypicalNotes
Embedding Generation~150msPer query, OpenAI API
Vector Search~50-100msQdrant with filters
Feature Extraction~200msAI filter extraction
Total Search300-500msEnd-to-end
Product Ingestion~500ms/productIncluding embedding

Optimization Strategies

Hybrid Search combines semantic and exact matching:

// Qdrant hybrid search with filters
const results = await qdrant.search({
  vector: queryEmbedding,
  filter: {
    must: [
      { key: 'price.amount', range: { lte: maxPrice } },
      { key: 'brand', match: { value: brandFilter } }
    ]
  },
  limit: 10
});

AI-Powered Filter Extraction:

  • Natural language queries are analyzed for implicit filters
  • "Nike shoes under $100" extracts: brand=Nike, priceMax=100
  • Reduces vector search scope for faster, more relevant results

Scalability Architecture

Horizontal Scaling Pattern

MCIP scales horizontally with stateless application servers:

Scaling Considerations

  • Stateless Nodes: MCIP instances don't share state (except via Qdrant/Redis)
  • Qdrant Scaling: Can be clustered for larger catalogs
  • Queue Scaling: BullMQ supports multiple workers

Architecture Decisions

Why Monolithic (For Now)

We started with a monolith instead of microservices because:

  1. Faster iteration - Single deployment, unified logging
  2. Lower complexity - No service discovery needed
  3. Cost effective - One service to run
  4. Performance - No network hops between services

Why Qdrant Over Other Vector DBs

  • Self-hosted: Full control, no vendor lock-in
  • Payload Filtering: Native support for hybrid search
  • Open Source: Active community, well-documented
  • Performance: Excellent for catalog-sized datasets

Monitoring and Observability

Key Metrics We Track

Business Metrics:

  • Search relevance (semantic match quality)
  • Products indexed count
  • Query patterns and popular searches

Technical Metrics:

  • API latency (p50, p95, p99)
  • Qdrant query performance
  • Ingestion queue depth

Health Indicators:

GET /health
{
    "status": "ok"
}

Security Considerations

The architecture implements defense in depth:

  1. Admin Endpoints - Protected by API key header
  2. Application - Input validation with Zod schemas
  3. Infrastructure - Secrets via environment variables
  4. Data - No PII storage in vector database

What's Next