Company Logo
MCIP
Server

Search Orchestration

MCIP provides two complementary search modes: a fast simple vector search for straightforward queries and an agentic LangGraph workflow for complex natural language queries. The agentic pipeline runs a 4-stage state machine — parallel filter extraction, brand validation via facet search, hybrid vector search in Qdrant, and LLM verification — delivering precise, filtered results in under 500ms.

Search in the MCIP Protocol

Product discovery is the first operational module of the Machine Customer Interaction Protocol. MCIP's long-term vision is a universal commerce protocol covering the full lifecycle — search, cart, checkout, and order tracking. Search orchestration is where it all starts: when an AI agent asks "find me Nike shoes under $100," MCIP transforms that intent into precise, filtered product results from any connected e-commerce platform.

MCIP doesn't just match keywords — it understands meaning. The search system combines semantic vector embeddings with structured filter extraction to find products that truly match what users want, even when they use different words.


Two Search Modes

MCIP provides two complementary search endpoints, each optimized for different scenarios:

Endpoint: GET /search?q={query}&take={limit}&skip={offset}

Service: SearchService

Best for: Straightforward queries where speed matters most

Simple vector search goes directly from query to results with no LLM calls in the search path:

Query: "gaming laptop"

  [Embedding Generation]

  1536-dimensional vector (~150ms)

  [Qdrant Vector Search]

  Cosine similarity ranking (~250ms)

  Ranked results with relevance scores

This mode is fast and low-latency — pure embedding generation plus vector similarity search. No LLM overhead, no filter extraction. It returns results with filteringStatus: "VECTOR_ONLY".

Endpoint: GET /hard-filtering/search?q={query}&take={limit}&skip={offset}

Service: HardFilteringService

Best for: Complex natural language queries with implicit filters

This is MCIP's core differentiator — a 4-stage LangGraph state machine that autonomously understands queries, extracts filters, validates brands, performs hybrid search, and verifies results:

Query: "Nike shoes under $100 but not running"

  [Stage 1: Parallel Filter Extraction]
    ├── Extract categories → "Shoes" (excl. "Running")
    ├── Extract brands → "Nike"
    └── Extract price → max $100

  [Stage 2: Brand Validation]
    Facet search → Does "Nike" exist in catalog?

  [Stage 3: Hybrid Search]
    Vector similarity + payload filters in Qdrant

  [Stage 4: LLM Verification]
    GPT-4o-mini verifies results match intent

  Top 5 verified results

This mode returns results with filteringStatus: "AI_FILTERED" and includes appliedFilters in the response metadata.

Choosing the Right Mode

ScenarioRecommended ModeWhy
Simple keyword queriesSimple Vector SearchFaster, no LLM overhead
Queries with price constraintsAgentic SearchExtracts price filters automatically
Brand-specific queriesAgentic SearchValidates brand against catalog
Exclusion queries ("not X")Agentic SearchUnderstands negation
High-throughput applicationsSimple Vector SearchLower latency per request
AI agent interactionsAgentic SearchRicher, more accurate results

The Agentic LangGraph Workflow

The agentic search is built on LangGraph — a state machine framework for agentic LLM workflows with conditional edges and parallel execution. Each search request flows through four stages:

Stage 1: Parallel Filter Extraction

LangGraph runs three LLM calls in parallel using GPT-4o-mini, each extracting a different filter dimension:

// LangGraph parallel execution — all three run simultaneously:

// Branch 1: Category extraction
// "Nike shoes under $100 but not running"
// → categories: ["Shoes"], excludeCategories: ["Running"]

// Branch 2: Brand extraction  
// → brands: ["Nike"]

// Branch 3: Price extraction
// → priceRange: { max: 100, currency: "USD" }

// All three use Zod schemas for type-safe structured output parsing

Parallel execution means filter extraction takes roughly the same time as a single LLM call (~50-80ms), not three times longer.

What gets extracted:

Filter TypeExample QueryExtracted
Brand inclusion"nike shoes"brands: ["Nike"]
Brand exclusion"laptops not apple"excludeBrands: ["Apple"]
Price max"under $100"priceRange.max: 100
Price min"over 500"priceRange.min: 500
Price range"between 200 and 500"priceRange: {min: 200, max: 500}
Category"gaming laptops"categories: ["Gaming"]
Category exclusion"shoes not running"excludeCategories: ["Running"]

Stage 2: Brand Validation

Before searching, MCIP validates extracted brands against the actual store inventory. This prevents wasted searches and provides instant feedback:

// Query Qdrant facet search for available brands
const availableBrands = await qdrant.getFacetValues("brand");
// Returns: ["Nike", "Adidas", "Puma", "New Balance", ...]

// Validate extracted brand
if (!availableBrands.includes("Nike")) {
  // Brand not in catalog → return empty results immediately
  return { items: [], meta: { reason: "brand_unavailable" } };
}

This is a critical optimization — if a user asks for a brand the store doesn't carry, MCIP tells them immediately instead of returning irrelevant results.

With validated filters in hand, MCIP performs a hybrid search combining vector similarity with exact payload filtering:

This is a critical optimization — if a user asks for a brand the store doesn't carry, MCIP tells them immediately instead of returning irrelevant results.

### Stage 3: Hybrid Search

With validated filters in hand, MCIP performs a hybrid search combining vector similarity with exact payload filtering:

Qdrant applies payload filters before vector similarity ranking — this is more efficient than post-filtering because we never score products that can't be returned.

Stage 4: LLM Verification

The final stage passes search results through GPT-4o-mini for semantic verification:

// GPT-4o-mini verifies each result against original query intent
// "Do these results actually match 'Nike shoes under $100'?"

// Filters out false positives:
// - Products semantically similar but wrong category
// - Edge cases the payload filters missed
// - Results that technically match but aren't relevant

// Returns: Top 5 verified products with metadata

This extra verification step significantly improves result quality. The LLM catches edge cases that pure vector similarity and metadata filters miss.


The RAG Foundation: From Words to Understanding

What is RAG?

RAG (Retrieval-Augmented Generation) is the technique that makes MCIP understand meaning, not just match keywords. Instead of looking for products containing the word "comfortable," RAG finds products that are conceptually similar — including "ergonomic," "cushioned," "soft," and "cozy."

Both search modes build on the same RAG foundation: OpenAI's text-embedding-3-small model converts text into 1536-dimensional vectors that capture semantic meaning.

How Embeddings Work

These 1536 numbers capture the meaning of a query. "Shoes" and "footwear" generate similar vectors because they mean similar things — even though they share no characters:

// Product embedding strategy
// Products are converted to a searchable text blob before embedding:
const textToEmbed = `
  Title: ${product.title}
  Description: ${product.description}
  Keywords: ${product.keywords.join(", ")}
  Attributes: ${product.attributes.map(a => `${a.name}: ${a.value}`).join(", ")}
`;

// This combined text captures all searchable dimensions of a product

Hybrid Search: Why Both Vectors and Filters

Pure vector search is powerful but imprecise. Searching for "laptop under 500" might return a $1000 MacBook because it's semantically similar to laptops. Hybrid search combines the best of both:

  • Vector search: Understanding intent and meaning ("gaming laptop" → finds products about gaming computers)
  • Payload filters: Hard constraints that must be satisfied (price ≤ 500, brand = "Dell")

Qdrant handles this natively — pre-filtering eliminates non-matching products, then vector scoring ranks what remains.

Score Interpretation

ScoreMeaningExample
0.90+Excellent matchQuery and product are nearly identical concepts
0.75–0.90Good matchStrong semantic relationship
0.60–0.75Moderate matchRelated but not precise
<0.60Weak matchTangentially related at best

Performance Benchmarks

Measured Performance (100 Concurrent Users)

MetricP50P95P99
Embedding Generation145ms189ms212ms
Vector Search238ms287ms342ms
Total Response Time421ms498ms587ms
  • Throughput: 1,247 requests/second
  • Relevance Accuracy: 85–90%
  • Concurrent Sessions: Unlimited (memory-bound, horizontally scalable)

Why Each Stage is Fast

Embedding Generation (~150ms P50)

We use text-embedding-3-small — excellent semantic understanding with fast inference. The 1536 dimensions capture meaning without the overhead of larger models.

Qdrant Vector Search (~238ms P50)

Qdrant uses HNSW (Hierarchical Navigable Small World) graphs for approximate nearest neighbor search. With payload indexes on price.amount (float), brand (keyword), and category (keyword), filter conditions are nearly instant.

LangGraph Parallel Execution

Stage 1 of the agentic workflow runs three LLM calls simultaneously via LangGraph. This means filter extraction adds roughly one LLM call's latency, not three.


Filtering Status Values

Every search response includes a filteringStatus field indicating how the query was processed:

StatusModeDescription
AI_FILTEREDAgentic SearchLangGraph workflow extracted and applied filters successfully
VECTOR_ONLYSimple SearchPure vector similarity search, no filter extraction
RAG_ONLYAgentic SearchFilter extraction attempted but fell back to pure vector search
FALLBACKEitherDegraded mode (e.g., embedding API failure, keyword fallback)

Graceful Degradation

MCIP doesn't fail silently, but it doesn't crash on errors either. Each error type has a specific fallback strategy:

Error TypeHandling StrategyUser ImpactRecovery
Embedding API failureFallback to keyword searchDegraded relevanceAutomatic
Filter extraction failureFall back to pure vector search (RAG_ONLY)No filters appliedAutomatic
Qdrant timeoutReturn cached resultsPossibly stale dataAutomatic
Brand not foundReturn empty with reasonImmediate feedbackUser action
Rate limitingQueue and retryDelayed responseAutomatic with backoff

Common Search Error Codes

Error CodeMeaningSolution
2001Embedding generation failedCheck OpenAI API key and connectivity
2002Qdrant connection lostVerify Qdrant is running on port 6333
2003No results foundTry broader search terms
2004Filter extraction failedQuery proceeds without filters (RAG_ONLY)

Core Dependencies

ComponentPackageVersionPurpose
LangGraph@langchain/langgraph^1.0.15Agentic workflow state machines
LangChain Core@langchain/core^1.1.15LLM abstractions and structured output
LangChain OpenAI@langchain/openai^1.2.2OpenAI integration
OpenAI SDKopenai^6.9.1Embeddings (text-embedding-3-small)
Vector Database@qdrant/js-client-rest^1.16.0Vector search + payload filtering
Schema Validationzod^3.25.76Structured output parsing
MCP Protocol@rekog/mcp-nest^1.8.4AI agent tool registration

LLM Models Used

ModelPurposeWhere
text-embedding-3-smallGenerate 1536-dim semantic vectorsBoth search modes
GPT-4o-miniFilter extraction (Stage 1)Agentic search
GPT-4o-miniLLM verification (Stage 4)Agentic search

Configuration

# Required for search
OPENAI_API_KEY=sk-your-key
QDRANT_URL=http://qdrant:6333

Try It Yourself

# Direct vector similarity — fast, no LLM overhead
curl "http://localhost:8080/search?q=laptop"

# With pagination
curl "http://localhost:8080/search?q=headphones&take=20&skip=10"
# Full LangGraph workflow with automatic filter extraction
curl "http://localhost:8080/hard-filtering/search?q=nike+shoes+under+100"

# Exclusion filters
curl "http://localhost:8080/hard-filtering/search?q=gaming+laptop+except+asus"

# Price range
curl "http://localhost:8080/hard-filtering/search?q=phones+between+300+and+600"

Inspect the Response

Look at meta.filteringStatus and meta.appliedFilters to see how the query was processed:

{
  "meta": {
    "count": 5,
    "take": 10,
    "skip": 0,
    "q": "nike shoes under 100",
    "filteringStatus": "AI_FILTERED",
    "appliedFilters": {
      "brand": ["Nike"],
      "priceRange": { "min": null, "max": 100, "currency": "UAH" }
    }
  },
  "items": [
    {
      "externalId": "prod_123",
      "title": "Nike Air Max 90",
      "price": { "amount": 89.99, "currency": "USD" },
      "brand": "Nike",
      "score": 0.892
    }
  ]
}

Planned Enhancement: Search orchestration is designed to support parallel searches across multiple e-commerce platforms as part of MCIP's evolution toward a full Machine Customer protocol.

Future capabilities will extend the current pipeline:

Parallel Multi-Store Execution

Query arrives

[Fork] Search multiple platforms simultaneously
    ├── Shopify adapter
    ├── WooCommerce adapter
    └── Custom APIs

[Join] Aggregate results within time budget

Merged, deduplicated, ranked results

Planned Features

  • Time budget management: 2500ms total, per-platform timeouts (1000–1500ms)
  • Partial results: Show results from fast platforms while waiting for slow ones
  • Circuit breakers: Automatically disable failing platforms
  • Cross-store deduplication: Merge identical products from different sources
  • Unified ranking: Score and rank results across all sources

These features build on the current RAG pipeline — the semantic understanding stays the same, we just add more data sources through the adapter pattern.