MCIP uses a three-layer architecture with NestJS at its core, RAG for semantic intelligence, and real-time adapters for platform integration, delivering sub-500ms search across any e-commerce platform.
MCIP operates as an intelligent middleware layer between AI agents and e-commerce platforms. Unlike traditional API gateways that simply route requests, MCIP adds semantic understanding, session management, and protocol translation.
AI Services power our semantic understanding:
text-embedding-3-small. We chose this model for its optimal balance of speed (150ms) and accuracy for e-commerce contexts.Infrastructure Services ensure reliability:
MCIP's internal architecture follows Domain-Driven Design principles with clear separation of concerns:
Presentation Layer handles protocol translation:
Application Layer orchestrates business logic:
Domain Layer contains core business logic:
Infrastructure Layer connects to external systems:
We chose NestJS over Express or Fastify for several critical reasons:
@MCPTool({
name: 'search_product',
description: 'Search products with semantic understanding',
schema: SearchProductSchema
})
async searchProduct(params: SearchParams): Promise<ProductResult[]> {
// Implementation leverages DI for all services
}Our semantic search pipeline transforms queries through multiple stages:
Why 512 dimensions?
We tested 1536-dim (text-embedding-3-large) but found:
No Product Database - This is crucial to understand:
Based on production monitoring across 10,000+ daily searches:
| Operation | P50 | P95 | P99 | Target |
|---|---|---|---|---|
| Embedding Generation | 145ms | 189ms | 212ms | <200ms |
| Vector Search | 238ms | 287ms | 342ms | <300ms |
| Single Store Fetch | 180ms | 450ms | 890ms | <1000ms |
| Total Search (1 store) | 421ms | 498ms | 587ms | <500ms |
| Total Search (3 stores) | 1,243ms | 2,187ms | 2,876ms | <3000ms |
| Cart Operations | 12ms | 34ms | 67ms | <100ms |
Parallel Processing is our secret weapon:
// All stores searched simultaneously
const results = await Promise.allSettled(
stores.map(store =>
Promise.race([
store.search(query),
timeout(1500) // Per-store timeout
])
)
);
// Failed stores don't block othersIntelligent Caching reduces redundant work:
Graceful Degradation ensures reliability:
MCIP scales horizontally with stateless application servers:
Current Capacity (single instance):
Scaled Capacity (3-node cluster):
We scale based on multiple metrics:
We started with a monolith instead of microservices because:
We're prepared to extract services when needed:
Every architecture decision has trade-offs. We chose real-time fetching over maintaining a product cache because:
Advantages:
Trade-offs:
For machine customers making purchase decisions, accuracy trumps speed.
Business Metrics:
Technical Metrics:
Health Indicators:
GET /health
{
"status": "healthy",
"uptime": 425234,
"memory": { "used": "1.2GB", "limit": "4GB" },
"redis": "connected",
"pinecone": "healthy",
"adapters": {
"vendure": "online",
"shopify": "online",
"woocommerce": "degraded"
}
}While detailed security is covered elsewhere, the architecture implements defense in depth:
MCIP's architecture balances simplicity with sophistication. We've built a system that's easy to deploy (single Docker container) yet powerful enough to handle the complexity of semantic search across heterogeneous e-commerce platforms.
The key insight: by focusing on protocol translation rather than data aggregation, we've created an architecture that scales with the growth of machine customers while maintaining the flexibility to evolve.