System Context
MCIP operates as an intelligent middleware layer between AI agents and e-commerce platforms. Unlike traditional API gateways that simply route requests, MCIP adds semantic understanding, session management, and protocol translation.
External Dependencies
AI Services power our semantic understanding:
- OpenAI API: Generates 1536-dimensional embeddings using
text-embedding-3-small. We chose this model for its optimal balance of speed and accuracy for e-commerce contexts.
- Qdrant: Stores and searches vectors with cosine similarity. An open-source vector database that can be self-hosted, providing fast similarity search with payload filtering.
Infrastructure Services ensure reliability:
- Docker: Containerizes everything for consistent deployment across environments.
- BullMQ + Redis: Manages product ingestion queue for reliable processing.
Component Architecture
MCIP's internal architecture follows Domain-Driven Design principles with clear separation of concerns:
Layer Responsibilities
Presentation Layer handles protocol translation:
- MCP Handler: Implements Model Context Protocol with Zod validation
- REST API: HTTP endpoints for search, admin operations, and ingestion
Application Layer orchestrates business logic:
- Tool Service: Manages the MCP tool (search_product)
- Search Service: Coordinates vector search with filter extraction
- Ingestion Service: Manages product sync from e-commerce platforms
Domain Layer contains core business logic:
- Product Service: Fetches and normalizes products from different platforms
- Embedding Service: Manages the RAG pipeline for semantic understanding
- Feature Extraction: AI-powered extraction of filters from natural language queries
Infrastructure Layer connects to external systems:
- Product Mappers: Transform raw store data to unified product schema
- External Clients: Handle authentication, retries, and rate limiting
Technology Stack
Core Framework: NestJS
We chose NestJS over Express or Fastify for several critical reasons:
- Dependency Injection: Clean separation between layers
- TypeScript First: Type safety across the codebase
- Modular Architecture: Each adapter is a separate module
- Built-in Testing: High test coverage achieved easily
- Decorator Pattern: Perfect for MCP tool implementation
The RAG Pipeline
Our semantic search pipeline transforms queries through multiple stages:
Why 1536 dimensions?
We use text-embedding-3-small default dimensions (1536) as it provides:
- Excellent accuracy for e-commerce queries
- Good balance between quality and computational cost
- Native support in Qdrant without dimension reduction
Data Storage Strategy
Vector Database for Products:
- Products are synchronized to Qdrant via manual admin endpoint
- Each product is embedded and stored with payload metadata
- Hybrid search combines vector similarity with payload filters
- This enables both semantic understanding and precise filtering (price, brand, etc.)
Product Synchronization
MCIP uses a manual synchronization model rather than real-time product fetching:
How Sync Works
- Admin Trigger: Call
POST /admin/sync with admin API key
- Fetch Products: System fetches all products from configured source
- Queue Processing: Products are queued via BullMQ for processing
- Mapping: Each product is mapped to unified schema via adapters
- Embedding: Products are embedded using OpenAI
- Storage: Vectors and payloads stored in Qdrant
Why Manual Sync?
- Simplicity: No complex real-time infrastructure needed
- Control: You decide when to update the product catalog
- Reliability: Batch processing is more reliable than real-time
- Cost: Fewer API calls to embedding service
Real-World Metrics
Based on testing with product catalogs:
| Operation | Typical | Notes |
|---|
| Embedding Generation | ~150ms | Per query, OpenAI API |
| Vector Search | ~50-100ms | Qdrant with filters |
| Feature Extraction | ~200ms | AI filter extraction |
| Total Search | 300-500ms | End-to-end |
| Product Ingestion | ~500ms/product | Including embedding |
Optimization Strategies
Hybrid Search combines semantic and exact matching:
AI-Powered Filter Extraction:
- Natural language queries are analyzed for implicit filters
- "Nike shoes under $100" extracts: brand=Nike, priceMax=100
- Reduces vector search scope for faster, more relevant results
Scalability Architecture
Horizontal Scaling Pattern
MCIP scales horizontally with stateless application servers:
Scaling Considerations
- Stateless Nodes: MCIP instances don't share state (except via Qdrant/Redis)
- Qdrant Scaling: Can be clustered for larger catalogs
- Queue Scaling: BullMQ supports multiple workers
Architecture Decisions
Why Monolithic (For Now)
We started with a monolith instead of microservices because:
- Faster iteration - Single deployment, unified logging
- Lower complexity - No service discovery needed
- Cost effective - One service to run
- Performance - No network hops between services
Why Qdrant Over Other Vector DBs
- Self-hosted: Full control, no vendor lock-in
- Payload Filtering: Native support for hybrid search
- Open Source: Active community, well-documented
- Performance: Excellent for catalog-sized datasets
Monitoring and Observability
Key Metrics We Track
Business Metrics:
- Search relevance (semantic match quality)
- Products indexed count
- Query patterns and popular searches
Technical Metrics:
- API latency (p50, p95, p99)
- Qdrant query performance
- Ingestion queue depth
Health Indicators:
Security Considerations
The architecture implements defense in depth:
- Admin Endpoints - Protected by API key header
- Application - Input validation with Zod schemas
- Infrastructure - Secrets via environment variables
- Data - No PII storage in vector database
What's Next
- Three-Layer Design - Deep dive into each layer
- Protocol-First Approach - Why protocol matters
- Adapters - Building product mappers
- MCP Tools - Available AI agent tools