Company Logo
MCIP
Architecture

Architecture Overview

MCIP — the Machine Customer Interaction Protocol — is a universal standard that enables AI agents to participate in commerce as first-class customers. It defines how machine customers search, browse, add to cart, check out, and track orders across any e-commerce platform through a single protocol interface. Built on a three-layer NestJS architecture with MCP (Model Context Protocol) at its core, the system currently implements intelligent product discovery as its first module — powered by LangGraph agentic workflows and Qdrant hybrid vector search

The Machine Customer Era

A machine customer is an AI agent that acts on behalf of a human to discover, evaluate, and purchase products. Think of it as giving your AI assistant a credit card and sending it shopping. Today, AI agents can answer questions and write code — but when it comes to commerce, they hit a wall. Every e-commerce platform has different APIs, different schemas, different authentication, and different capabilities.

MCIP solves this by defining a universal protocol for machine customers. Just as HTTP standardized how browsers request web pages, MCIP standardizes how AI agents interact with stores. The protocol covers the entire commerce lifecycle:

  • Discover — Search and browse products with semantic understanding
  • Evaluate — Compare products across stores with normalized schemas
  • Transact — Manage carts, checkout, and payment (planned)
  • Track — Monitor orders and fulfillment (planned)

Product discovery is the first implemented module — because you can’t buy what you can’t find. The architecture described below supports the full lifecycle, with each new module plugging into the same three-layer design and MCP protocol interface.

System Context

MCIP operates as the protocol layer between machine customers and commerce platforms. Unlike traditional API gateways that simply route requests, MCIP provides a standardized interface (via MCP — Model Context Protocol), adds semantic intelligence through LangGraph agentic workflows, and translates between the protocol’s unified schema and each platform’s native API.

The system context below shows how MCIP sits at the center of the machine customer ecosystem. AI agents communicate through the MCP protocol, MCIP handles the intelligence and translation, and e-commerce platforms serve products through their native APIs. Currently, the product discovery module is active — the same architecture supports cart, checkout, and order modules as the protocol grows.

External Dependencies

AI Services power our semantic understanding:

  • OpenAI Embeddings: Generates 1536-dimensional embeddings using text-embedding-3-small for vector search. Chosen for its optimal balance of speed and accuracy for e-commerce contexts.
  • OpenAI GPT-4o-mini: Powers the agentic search pipeline — extracts filters (brand, category, price) from natural language queries and verifies search result relevance.
  • LangGraph + LangChain: Orchestrates agentic workflows as state machines. LangGraph enables parallel filter extraction, conditional routing (brand validation), and multi-step reasoning.
  • Qdrant: Open-source vector database for hybrid search — combining vector similarity with exact payload filtering (brand, category, price range). Supports facet search for brand validation.

Infrastructure Services ensure reliability:

  • Docker: Containerizes everything for consistent deployment across environments.
  • BullMQ + Redis: Manages product ingestion queue for reliable async processing with retry logic and job persistence.

Three-Layer Component Architecture

MCIP's architecture follows a clean three-layer design implemented as NestJS modules. Each layer has clear responsibilities and communicates through well-defined interfaces.

Layer 1: Presentation & Protocol

Handles protocol translation and request routing:

  • Search Controller: HTTP endpoints for simple vector search (GET /search)
  • Hard Filtering Controller: HTTP endpoints for agentic search (GET /hard-filtering/search) — triggers the full LangGraph workflow
  • MCP Tools: Implements Model Context Protocol using @rekog/mcp-nest with Zod validation for type-safe tool definitions (currently: search_product)
  • Admin Controller: Protected endpoints for product sync and index management

Layer 2: Application Services

Orchestrates business logic and coordinates domain services:

  • Search Service: Coordinates simple vector search — embedding generation, Qdrant similarity search, result ranking
  • Hard Filtering Service: Orchestrates the full LangGraph agentic workflow — parallel filter extraction, brand validation, hybrid search, and LLM verification
  • Ingestion Service: Manages product sync from e-commerce platforms, queuing jobs via BullMQ
  • Admin Service: System management operations (sync triggers, index rebuilds)

Layer 3: Domain & Infrastructure

Contains core business logic and external system integrations:

  • Product Repository (Qdrant): Stores product vectors and metadata. Supports vector similarity search, hybrid filtering by payload fields, and facet queries for brand validation
  • Vectorization Service (OpenAI): Generates 1536-dimensional embeddings for products and queries using text-embedding-3-small
  • Product Mapper (Adapters): Transforms raw store data to unified product schema — VendureMapper for Vendure GraphQL, CustomAiMapper as AI-powered fallback for any data format
  • BullMQ Processor: Async queue worker that processes product ingestion jobs — mapping, embedding, and storing each product

Current Module: Product Discovery

Product discovery is MCIP’s first protocol module — and arguably the most important. Before a machine customer can add items to a cart or check out, it needs to find the right products. This module transforms how AI agents search by combining semantic understanding with intelligent filtering.

MCIP provides two complementary search approaches within the product discovery module, each optimized for different query types:

Endpoint: GET /search?q={query}&take={limit}&skip={offset}

Best for: Straightforward queries where speed matters most.

Direct vector similarity search in Qdrant. No LLM calls — pure embedding generation followed by cosine similarity matching. Fast, low-latency, and effective for clear queries.

curl "http://localhost:8080/search?q=laptop"

Endpoint: GET /hard-filtering/search?q={query}&take={limit}&skip={offset}

Best for: Complex natural language queries with implicit filters (brand, price, category).

Full LangGraph agentic workflow with 4 stages — parallel filter extraction, brand validation, hybrid search, and LLM verification. Intelligent, precise, and designed for queries like "Nike shoes under $100 but not running shoes."

curl "http://localhost:8080/hard-filtering/search?q=nike+shoes+under+100"

Choosing Between Modes

AspectSimple Vector SearchAgentic Hard-Filtered Search
Latency~300ms~500ms
LLM Calls0 (embedding only)4+ (extraction, verification)
Filter ExtractionNoneAutomatic (brand, price, category)
Brand ValidationNoYes (Qdrant facet search)
Result VerificationNoYes (LLM semantic check)
Best ForSimple, direct queriesComplex natural language queries

Agentic Search: The LangGraph Pipeline

The agentic search flow is the product discovery module’s core intelligence — a 4-stage LangGraph state machine that transforms natural language queries into precisely filtered, verified product results.

Stage 1 — Parallel Filter Extraction

Three LLM calls run in parallel using LangGraph's parallel execution:

  • Category extraction: Identifies product categories ("shoes", "laptops")
  • Brand extraction: Identifies intended brands ("Nike", "Apple")
  • Price extraction: Identifies price constraints ("under $100", "between 500 and 1000")

All three use Zod schemas for type-safe structured output parsing from GPT-4o-mini.

Stage 2 — Brand Validation

Queries Qdrant's facet search to get all available brands in the product catalog. If the user asked for a brand that doesn't exist in the store, MCIP returns empty results immediately — no point searching for products that aren't there.

Combines two search strategies in Qdrant simultaneously:

  • Vector similarity: Cosine distance on 1536-dimensional embeddings for semantic matching
  • Payload filtering: Exact match on brand, category, and price range for precision

This hybrid approach gives you the best of both worlds — semantic understanding with precise filtering.

Stage 4 — LLM Verification

Passes the search results through GPT-4o-mini for a final semantic check. The LLM verifies that results actually match the user's intent, filtering out false positives. Returns the top 5 verified products.


Technology Stack

Current Stack (Verified from Source)

CategoryTechnologyVersionPurpose
FrameworkNestJS11.0.1Application framework with DI
LanguageTypeScript5.7+Type safety across the codebase
Agentic WorkflowsLangGraph1.0.15State machine workflows with parallel execution
AI OrchestrationLangChain1.1.15LLM abstractions and structured output
LLM ProviderOpenAI GPT-4o-miniFilter extraction and result verification
EmbeddingsOpenAI6.9.1text-embedding-3-small (1536 dimensions)
Vector DBQdrant1.16.0Hybrid search + payload filtering + facets
MCP Protocol@modelcontextprotocol/sdk1.25.2AI agent communication standard
MCP Integration@rekog/mcp-nest1.8.4NestJS MCP binding with decorators
QueueBullMQ5.63.2Async job processing with retry logic
Cache/Queue BackendRedisAlpineQueue backend and session storage
ValidationZod3.25.76Schema validation + LLM structured output parsing

Core Framework: NestJS 11

We chose NestJS over Express or Fastify for several critical reasons:

  • Dependency Injection: Clean separation between layers — services, controllers, and repositories are loosely coupled via Symbol tokens
  • TypeScript First: Type safety across the entire codebase
  • Modular Architecture: Each concern is a separate NestJS module (search, ingestion, admin, vectorization, repository)
  • Decorator Pattern: Perfect for MCP tool definitions with @rekog/mcp-nest
// MCP tool definition using @rekog/mcp-nest
import { Tool } from '@rekog/mcp-nest';
import { z } from 'zod';

const SearchProductSchema = z.object({
  query: z.string().describe('Natural language search query'),
  take: z.number().optional().default(10),
  skip: z.number().optional().default(0),
});

@Tool({
  name: 'search_product',
  description: 'Search products with semantic understanding',
  parameters: SearchProductSchema,
})
async searchProduct(params: z.infer<typeof SearchProductSchema>) {
  return this.searchService.search(params);
}

The RAG Pipeline

Our semantic search pipeline transforms queries through multiple stages:

Why 1536 dimensions? We use text-embedding-3-small at full precision (1536 dimensions) for excellent accuracy in e-commerce queries with native Qdrant support — no dimension reduction needed.

LangGraph Integration Benefits:

  • Parallel Execution: Three filter extraction calls run simultaneously, reducing latency
  • Conditional Routing: Brand validation can short-circuit the pipeline if a brand isn't available
  • State Management: Each workflow step has access to accumulated state
  • Observability: Built-in tracing for debugging agentic behavior

Product Synchronization

MCIP uses a manual synchronization model for product ingestion via BullMQ:

How Sync Works

  1. Admin Trigger: Call POST /admin/sync with admin API key
  2. Fetch Products: System fetches all products from configured source (SOURCE_URL)
  3. Queue Processing: Products are queued via BullMQ for async processing with retry logic
  4. Mapping: Each product is mapped to the unified UnifiedProduct schema via adapters (VendureMapper or CustomAiMapper)
  5. Embedding: Products are embedded using OpenAI text-embedding-3-small (1536 dimensions)
  6. Storage: Vectors and payloads stored in Qdrant for hybrid search
# Trigger sync from configured source
curl -X POST http://localhost:8080/admin/sync \
  -H "x-admin-api-key: your-secret-key"

Why Manual Sync?

  • Simplicity: No complex real-time infrastructure needed
  • Control: You decide when to update the product catalog
  • Reliability: Batch processing with BullMQ retry logic is more robust than real-time
  • Cost: Fewer API calls to embedding service


Performance Benchmarks

Measured Metrics (100 Concurrent Users)

OperationP50P95P99Notes
Embedding Generation145ms189ms212msPer query, OpenAI API
Vector Search (Qdrant)238ms287ms342msWith payload filtering
Feature Extraction~200msLangGraph parallel extraction
Total Search (Simple)~300msVector search mode
Total Search (Agentic)421ms498ms587msFull LangGraph pipeline
Throughput1,247 requests/secondUnder load
Product Ingestion~500ms/productIncluding embedding generation

Optimization Strategies

Hybrid Search combines semantic and exact matching in Qdrant:

// Qdrant hybrid search with AI-extracted filters
const results = await qdrant.search('products', {
  vector: queryEmbedding,
  filter: {
    must: [
      { key: 'price.amount', range: { lte: extractedFilters.maxPrice } },
      { key: 'brand', match: { value: extractedFilters.brand } }
    ]
  },
  limit: 10,
  with_payload: true,
});

Parallel Filter Extraction via LangGraph:

  • Three GPT-4o-mini calls run simultaneously (brand, category, price)
  • Reduces agentic search latency vs. sequential extraction
  • Combined with Qdrant facet search for real-time brand validation

Scalability Architecture

Horizontal Scaling Pattern

MCIP scales horizontally with stateless application servers:

Scaling Considerations

  • Stateless Nodes: MCIP instances share no state — all persistence is via Qdrant and Redis
  • Qdrant Scaling: Can be clustered for larger catalogs (100K+ products)
  • Queue Scaling: BullMQ supports multiple concurrent workers
  • Concurrent Sessions: 1,000+ (memory-bound, horizontally scalable)

Architectural Decisions Record

Key decisions that shaped the current architecture:

DecisionRationaleTrade-offsStatus
NestJS FrameworkEnterprise TypeScript framework, DI container, modular architectureLearning curve, heavier than ExpressImplemented
LangGraph WorkflowsDeclarative agent workflows, parallel execution, conditional routingAdditional complexity vs simple chainsImplemented
Qdrant Vector DBOpen-source, high performance, payload filtering, facet searchSelf-hosted infrastructure requiredImplemented
1536-dim VectorsFull precision from text-embedding-3-small, no truncationHigher storage vs reduced dimensionsImplemented
BullMQ + RedisRobust job queue, retry logic, job persistence, proven at scaleAdditional infrastructure dependencyImplemented
MCP ProtocolStandardized AI agent communication, growing ecosystemCurrently Claude-focused, evolving specImplemented
Adapter PatternPluggable platform integration, unified schemaMapping complexity for each platformImplemented
Zod SchemasType-safe validation, LLM structured output, runtime safetySchema duplication with TypeScript interfacesImplemented
Monolithic ArchitectureFaster iteration, single deployment, lower complexity, no network hopsMust refactor for microservices at scaleImplemented

Monitoring and Observability

Key Metrics We Track

Business Metrics:

  • Search relevance scores (semantic match quality)
  • Products indexed count
  • Query patterns and popular searches
  • Filtering status distribution (AI_FILTERED vs VECTOR_ONLY)

Technical Metrics:

  • API latency (P50, P95, P99)
  • Qdrant query performance
  • LangGraph workflow execution time
  • BullMQ ingestion queue depth
  • Feature extraction accuracy

Health Indicators:

curl http://localhost:8080/health
# Response: {"status":"ok"}

Security Considerations

The architecture implements defense in depth across four layers:

  1. External Layer — Rate limiting, DDoS protection (reverse proxy)
  2. Application Layer — Input validation with Zod schemas, session validation
  3. Service Layer — API key management via environment variables, admin endpoint protection (x-admin-api-key)
  4. Data Layer — No PII storage in vector database, TLS for all external connections

Container Security: Non-root user, multi-stage Docker builds, health checks.


Evolution Roadmap

MCIP's three-layer architecture is designed to grow from intelligent product search to full commerce lifecycle support:

Extension Points

MCIP's architecture provides well-defined extension points for growing the protocol beyond product discovery:

Extension PointCurrentFuture CapabilityImplementation Path
Platform AdaptersVendure onlyAny e-commerce APIImplement IProductService interface
Search MethodsRAG + AgenticHybrid search strategiesPluggable search strategies
AI ModelsOpenAI onlyMultiple providersAbstract embedding service
Cart StorageRedis onlyMultiple backendsStorage adapter pattern
ProtocolMCP onlyGraphQL, REST APIsProtocol adapters
Commerce ModulesProduct DiscoveryCart, Checkout, OrdersNestJS module per capability

The IProductService interface defines the contract for future platform adapters:

// Future platform adapter contract
interface IProductService {
    searchProducts(query: string, options?: SearchOptions): Promise<Product[]>;
    getProduct(id: string): Promise<Product>;
    getProductBySlug(slug: string): Promise<Product>;
}

This interface will become the standard way to connect new e-commerce platforms (Shopify, WooCommerce, custom APIs) as MCIP evolves into a true multi-store protocol.

Summary

MCIP's architecture balances simplicity with sophistication to serve as a universal commerce protocol. Product discovery is the first implemented module — powered by LangGraph agentic workflows and Qdrant hybrid search — with the three-layer design ready to accommodate cart, checkout, and order tracking as the protocol evolves.

The key architectural insights:

  • LangGraph powers a 4-stage agentic pipeline for intelligent query understanding with parallel execution
  • Qdrant provides hybrid search combining vector similarity with exact payload filtering
  • Two search modes serve different needs — simple vector search for speed, agentic search for precision
  • MCP Protocol standardizes AI agent communication for any commerce operation
  • BullMQ enables scalable async product ingestion with retry logic
  • Adapter pattern makes any e-commerce platform connectable through a unified interface

By focusing on semantic understanding and clean product mapping as the first step, we've created an architecture that's maintainable, extensible, and ready to grow into a complete commerce protocol.