Company Logo
MCIP
Server

Core Server Concepts

MCIP's server powers a universal commerce protocol for AI agents — orchestrating agentic intelligence through LangGraph workflows, semantic search via Qdrant, session management, and platform coordination – all in under 500ms. Built on NestJS 11 with LangGraph state machines for autonomous query understanding, BullMQ async processing, and a clean three-layer architecture. Product discovery is the first implemented module, with cart management, checkout, and order tracking designed as future extensions.

Current Technology Stack

ComponentTechnologyVersionPurpose
FrameworkNestJS11.0.1Application foundation
MCP Integration@rekog/mcp-nest1.8.4Model Context Protocol server
Agentic OrchestrationLangGraph@langchain/langgraph 1.0.15State machine workflows for autonomous query processing
LLM FrameworkLangChain@langchain/openai 1.2.2Structured output, embeddings, LLM abstractions
LLM ProviderOpenAI GPT-4o-miniLatestFilter extraction, brand validation, result verification
Queue SystemBullMQ5.xAsync product ingestion
Vector DatabaseQdrantLatestSemantic search + payload filtering + facet search
Cache/SessionsRedisAlpineSession state, queue backend
EmbeddingsOpenAItext-embedding-3-small1536-dimension vectors
ValidationZod3.xRuntime type checking, LLM structured output parsing

The Server Is Your Orchestra Conductor

Imagine you're at a symphony. The conductor doesn't play any instrument, but without them, you'd have chaos. Some musicians would play too fast, others too slow. The violins would drown out the flutes. It would be a mess.

MCIP's server is that conductor. It powers the Machine Customer Interaction Protocol — a universal commerce protocol where AI agents can search, discover, and eventually transact across any e-commerce platform. Product discovery is the first movement in this symphony, and it's already a showstopper.

When an AI agent asks for "gaming laptops under $1500," the server orchestrates a complex performance through its LangGraph agentic workflow: extracting filters in parallel via GPT-4o-mini, validating brands against the actual catalog, generating embeddings, performing hybrid vector search in Qdrant, and verifying results through LLM — all while keeping perfect time. Each service plays its part, and the server ensures they harmonize into a single, beautiful response delivered in under 500ms.

This isn't just about making things work. It's about building a protocol server that's designed to grow — from product discovery today to cart management, checkout flows, and order tracking tomorrow. Let's dive into how we achieve this orchestration.


Server Architecture: The Big Picture

The Three-Layer Architecture in Action

Our server architecture follows a three-layer structure (Presentation & Protocol → Application Services → Domain & Infrastructure) that processes every request with precision:

Layer 1: Presentation & Protocol (Reception)

When a request arrives, it's like a guest arriving at a hotel. The controllers and MCP handler (powered by @rekog/mcp-nest) greet them, validate their credentials using Zod schemas, and direct them to the right service. This layer includes the Search Controller, Hard Filtering Controller (for agentic search), MCP Tools, and Admin Controller. Bad requests are politely but firmly turned away at the door.

Layer 2: Application Services (Orchestration)

This is where the magic happens. MCIP provides two search modes, each with its own service:

  • Simple Vector Search (SearchService): Direct vector similarity search in Qdrant. Fast, low-latency for straightforward queries. No LLM calls — pure embedding + vector search.
  • Agentic Hard-Filtered Search (HardFilteringService): The full LangGraph 4-stage pipeline — our core differentiator. This state machine workflow autonomously processes complex natural language queries through: (1) Parallel Filter Extraction — three GPT-4o-mini calls run simultaneously to extract categories, brands, and price constraints; (2) Brand Validation — Qdrant facet search verifies extracted brands exist in the catalog; (3) Hybrid Search — combines vector similarity with exact payload filtering on brand, category, and price; (4) LLM Verification — GPT-4o-mini semantically verifies results match the user's intent.

Additionally, the Ingestion Service handles product sync via BullMQ, and the Admin Service manages system operations.

Layer 3: Domain & Infrastructure (Execution)

Results from Qdrant are normalized, scored by relevance, and formatted. The Product Repository (Qdrant), Vectorization Service (OpenAI), Product Mapper (adapters), and BullMQ Processor all live here. The response is crafted and delivered back to the AI agent – typically in 300-500ms.

Current Architecture: Focused Monolith

You might wonder: "Why not microservices?" Great question! We deliberately chose a monolithic architecture for MCIP's current phase, and here's why:

Speed of Development: With a monolith, we can iterate rapidly. New features go from idea to production in days, not weeks. When you're pioneering a new protocol, this agility is priceless.

Simplified Deployment: One container, one deployment, one thing to monitor. Docker Compose brings up MCIP, Qdrant, and Redis together. You really can be up and running in 5 minutes.

Performance Benefits: No network hops between services means lower latency. When you're targeting sub-500ms responses, every millisecond counts. Internal method calls are always faster than HTTP requests.

Easy Scaling: Modern monoliths scale horizontally just fine. Spin up more instances behind a load balancer, and you're handling more traffic. Simple.

That said, our architecture is designed for future decomposition. Service boundaries are clear, dependencies are injected via Symbol tokens, and when the time comes to break things apart, we can do so surgically.


NestJS 11: The Framework That Gets Out of Your Way

Why NestJS?

Choosing a framework is like choosing a car. You want something reliable, powerful, but not so complex that you need a PhD to drive it. NestJS hits that sweet spot perfectly.

NestJS brings enterprise-grade patterns to Node.js without enterprise-grade complexity. It's TypeScript-first, which means we catch errors at compile time, not in production. It has decorators that make code readable – you can understand what a class does just by looking at its decorators. And it includes everything we need out of the box: dependency injection, middleware, guards, pipes, interceptors.

Decorators: Making Code Self-Documenting

In MCIP, decorators tell the story of what each component does:

@Injectable()           // "I'm a service you can inject"
@Controller('search')   // "I handle /search routes"  
@Get(':id')            // "I respond to GET requests"
@Processor('product-ingestion')  // "I process queue jobs"

These aren't just annotations – they're executable documentation. A new developer can understand the entire request flow just by reading decorators.

Modules: Organized Like a Library

NestJS modules organize code like sections in a library:

src/modules/
├── admin/              # Admin endpoints (/admin/sync, /admin/recreate-indexes)
├── hard-filtering/     # LangGraph agentic search (4-stage pipeline)
├── ingestion/          # Product import pipeline + BullMQ processor
├── repository/         # Qdrant data access + hybrid search
├── search/             # Simple vector search + feature extraction
└── vectorization/      # OpenAI embedding generation

Each module is self-contained but can share services with others through exports. Want to add a new search algorithm? Just focus on the search module.


Dependency Injection: The Art of Loose Coupling

Services That Don't Know Each Other

Here's a beautiful thing about MCIP's architecture: services don't know about each other's implementations. The search service doesn't know how embeddings are generated. The ingestion processor doesn't know how vectors are stored.

MCIP uses Symbol-based injection tokens for maximum flexibility:

// constants/tokens.ts
export const PRODUCT_REPOSITORY = Symbol('PRODUCT_REPOSITORY');
export const VECTORIZATION_SERVICE = Symbol('VECTORIZATION_SERVICE');
export const PRODUCT_MAPPER = Symbol('PRODUCT_MAPPER');

// Usage in a service
@Injectable()
export class SearchService {
  constructor(
    @Inject(PRODUCT_REPOSITORY) 
    private readonly repository: ProductRepository,
    @Inject(VECTORIZATION_SERVICE)
    private readonly vectorization: VectorizationService
  ) {}
}

This loose coupling means we can swap implementations without breaking anything. Want to switch from OpenAI to a different embedding provider? Change one provider binding, everything else keeps working.

The Provider Pattern

Every service in MCIP is a provider – something that provides functionality to others:

  • Services: Business logic like SearchService, IngestionService
  • Repositories: Data access like QdrantProductRepository
  • Mappers: Data transformation like VendureMapper, CustomAiMapper
  • Processors: Queue handlers like IngestionProcessor

The beauty is that consumers don't care what type of provider they're using. They just declare their needs via injection tokens, and NestJS handles the wiring.


Service Layer: Where Business Logic Lives

Clean Separation of Concerns

The service layer is where MCIP's intelligence resides. Controllers handle HTTP concerns. Repositories handle data access. But services? Services handle business logic – the rules, algorithms, and orchestration that make MCIP special.

Service Orchestration: Two Search Paths

MCIP provides two distinct search modes, each optimized for different query types:

Path 1 — Simple Vector Search (via SearchService):

  1. SearchController receives the query via /search?q=...
  2. FeatureExtractionService uses OpenAI structured output (Zod schemas) to parse natural language into filters (brand, price range, exclusions)
  3. VectorizationService generates a 1536-dimensional embedding via OpenAI
  4. QdrantProductRepository performs hybrid search combining vector similarity with metadata filters
  5. SearchService formats results with relevance scores

This path is fast and efficient — no multi-step LLM reasoning. Best for straightforward queries.

Path 2 — Agentic Hard-Filtered Search (via HardFilteringService):

  1. HardFilteringController receives the query via /hard-filtering/search?q=...
  2. HardFilteringService initializes a LangGraph state machine workflow
  3. Stage 1 — Parallel Filter Extraction: Three GPT-4o-mini calls run simultaneously via LangGraph parallel nodes to extract categories, brands, and price constraints (Zod-validated structured output)
  4. Stage 2 — Brand Validation: Qdrant facet search (getFacetValues("brand")) verifies extracted brands exist in the actual catalog. If the brand isn't found → empty results returned immediately
  5. Stage 3 — Hybrid Search: Query embedding (1536-dim) + payload filtering (brand, category, price) in Qdrant
  6. Stage 4 — LLM Verification: GPT-4o-mini semantically verifies that results match the user's original intent
  7. HardFilteringService returns top 5 verified products with metadata

This path is MCIP's core differentiator — autonomous multi-step reasoning that understands intent, not just keywords.

Each service has one job and does it well. But together, they create something greater than the sum of their parts.

The Ingestion Pipeline

Product ingestion uses BullMQ for reliable async processing:

@Processor('product-ingestion')
export class IngestionProcessor {
  @Process('process-product')
  async handleProduct(job: Job<RawProduct>) {
    // 1. Map raw data to UnifiedProduct via PRODUCT_MAPPER
    const product = await this.mapper.map(job.data);
    
    // 2. Generate embedding via VECTORIZATION_SERVICE
    const vector = await this.vectorization.embedProduct(product);
    
    // 3. Store in Qdrant via PRODUCT_REPOSITORY
    await this.repository.save(product, vector);
  }
}

This queue-based approach means product imports don't block API responses, and failed jobs can retry automatically.

Error Boundaries

Services also act as error boundaries. When something goes wrong in the EmbeddingService, it doesn't crash the entire search. Instead, errors are caught, logged, and appropriate HTTP responses are returned. This resilience is built into the service layer design.


Infrastructure Components: The Supporting Cast

Qdrant: The Vector Memory

Qdrant is MCIP's semantic memory. Products are stored as 1536-dimensional vectors alongside their metadata. The collection configuration:

// QdrantProductRepository.onModuleInit()
await this.client.createCollection('products', {
  vectors: { size: 1536, distance: 'Cosine' }
});

// Payload indexes for hybrid search
await this.client.createPayloadIndex('products', 'price.amount', 'float');
await this.client.createPayloadIndex('products', 'brand', 'keyword');
await this.client.createPayloadIndex('products', 'category', 'keyword');

Qdrant enables both semantic similarity search (finding conceptually similar products) and filtered search (price ranges, brand exclusions).

Redis: The Memory Palace

Redis serves two critical roles in MCIP:

  1. BullMQ Backend: Queue storage for product ingestion jobs
  2. Session Cache: User sessions with 24-hour TTL (planned)

Why Redis? Because it's blazingly fast (sub-millisecond reads), reliable (battle-tested in production), and simple (key-value at heart).

BullMQ: Reliable Async Processing

Product ingestion runs through BullMQ queues:

// Ingestion Service
async queueProducts(products: RawProduct[]) {
  for (const product of products) {
    await this.queue.add('process-product', product, {
      removeOnComplete: true,
      attempts: 3
    });
  }
}

This provides:

  • Retry logic: Failed jobs retry up to 3 times
  • Concurrency control: Process multiple products in parallel
  • Monitoring: Track queue depth and job status

Health Monitoring: The Pulse of the System

Every component reports its health via /health:

@Get('health')
getHealth() {
  return { status: 'ok' };
}

Docker Compose includes health checks for all services:

  • MCIP: HTTP check on /health
  • Redis: redis-cli ping
  • Qdrant: TCP check on port 6333

Services wait for dependencies to be healthy before starting.


Configuration: Flexibility Without Complexity

Environment-Driven

MCIP's configuration philosophy is simple: everything important should be configurable without recompiling:

# Required
OPENAI_API_KEY=sk-...

# Data Source
SOURCE_URL=https://demo.vendure.io/shop-api
SOURCE_STRATEGY=VENDURE
GRAPHQL_QUERY={products{items{...}}}

# Infrastructure (Docker defaults)
REDIS_HOST=redis
QDRANT_URL=http://qdrant:6333

# Security
ADMIN_API_KEY=your-secret-key

This means you can run MCIP in development with minimal resources, then scale to production without code changes.

Sensible Defaults

But here's the thing – you shouldn't need to configure everything. MCIP comes with sensible defaults that work for most use cases:

  • Port: 8080
  • Redis: redis:6379 (Docker service name)
  • Qdrant: http://qdrant:6333 (Docker service name)
  • Store Provider: VENDURE

Only override what you need to change. This philosophy keeps initial setup simple (remember the 5-minute promise) while allowing infinite customization.


Performance: Every Millisecond Counts

Parallel Processing

The secret to MCIP's speed? We parallelize everything possible:

  • Embedding generation (~150ms) and query parsing happen concurrently
  • BullMQ processes multiple products simultaneously during ingestion
  • Qdrant performs vector and filter search in one optimized operation

Smart Timeouts

Not all operations are equal. Search operations target sub-500ms total. But we're pragmatic – if results are ready, we return them. Better fast partial results than slow perfect results.

Connection Pooling

Every external connection uses pooling:

  • Qdrant REST client maintains persistent connections
  • Redis connections are pooled by BullMQ
  • OpenAI client reuses HTTPS connections

Creating connections is expensive. Reusing them is free. This optimization alone saves hundreds of milliseconds per request.


What Makes Our Server Special

It's not just a search engine — it's a Machine Customer Interaction Protocol server. That distinction matters.

It's not the fastest server (though it's pretty fast at 300-500ms).

It's not the most scalable (though it scales well horizontally).

It's not the most elegant (though we think it's beautiful).

What makes MCIP's server special is its agentic intelligence combined with protocol-first design. The LangGraph 4-stage pipeline doesn't just match keywords — it understands intent, validates against real catalog data, and verifies results through LLM reasoning. And it does this through a universal protocol that any AI agent can speak.

Product discovery is just the beginning. The three-layer architecture is designed so that cart management, checkout flows, and order tracking can be added as new modules — each following the same pattern of controllers, services, and infrastructure. Every architectural decision serves one goal: making commerce accessible to AI agents through one universal protocol.


Current Implementation Summary

AspectCurrent StateFuture Vision
ProtocolMachine Customer Interaction Protocol (MCP via @rekog/mcp-nest 1.8.4)Extended tool set for full commerce lifecycle
PlatformsSingle (Vendure via GraphQL)Multi-platform parallel (Shopify, WooCommerce, custom)
Search ModesTwo: Simple Vector Search + Agentic Hard-Filtered Search (LangGraph)Cross-store aggregation with unified ranking
Agentic IntelligenceLangGraph 4-stage pipeline (filter extraction, brand validation, hybrid search, LLM verification)Multi-turn conversation context, personalization
IngestionBullMQ async queue with retry logicReal-time sync + webhook triggers
SessionsBasic (planned: Redis 24hr TTL)Full cart persistence + user preferences
Commerce ModulesProduct discovery (first module)Cart management, checkout, order tracking