Server

Search Orchestration

MCIP provides two complementary search modes: a fast simple vector search for straightforward queries and an agentic LangGraph workflow for complex natural language queries. The agentic pipeline runs a 4-stage state machine — parallel filter extraction, brand validation via facet search, hybrid vector search in Qdrant, and LLM verification — delivering precise, filtered results in under 500ms.

Search in the MCIP Protocol

Product discovery is the first operational module of the Machine Customer Interaction Protocol. MCIP's long-term vision is a universal commerce protocol covering the full lifecycle — search, cart, checkout, and order tracking. Search orchestration is where it all starts: when an AI agent asks "find me Nike shoes under $100," MCIP transforms that intent into precise, filtered product results from any connected e-commerce platform.

MCIP doesn't just match keywords — it understands meaning. The search system combines semantic vector embeddings with structured filter extraction to find products that truly match what users want, even when they use different words.

Two Search Modes

MCIP provides two complementary search endpoints, each optimized for different scenarios:

Mode 1: Simple Vector Search

Endpoint: GET /search?q={query}&take={limit}&skip={offset}

Service: SearchService

Best for: Straightforward queries where speed matters most

Simple vector search goes directly from query to results with no LLM calls in the search path:

Query: "gaming laptop"
    ↓
  [Embedding Generation]
    ↓
  1536-dimensional vector (~150ms)
    ↓
  [Qdrant Vector Search]
    ↓
  Cosine similarity ranking (~250ms)
    ↓
  Ranked results with relevance scores

This mode is fast and low-latency — pure embedding generation plus vector similarity search. No LLM overhead, no filter extraction. It returns results with filteringStatus: "VECTOR_ONLY".

Mode 2: Agentic Hard-Filtered Search

Endpoint: GET /hard-filtering/search?q={query}&take={limit}&skip={offset}

Service: HardFilteringService

Best for: Complex natural language queries with implicit filters

This is MCIP's core differentiator — a 4-stage LangGraph state machine that autonomously understands queries, extracts filters, validates brands, performs hybrid search, and verifies results:

Query: "Nike shoes under $100 but not running"
    ↓
  [Stage 1: Parallel Filter Extraction]
    ├── Extract categories → "Shoes" (excl. "Running")
    ├── Extract brands → "Nike"
    └── Extract price → max $100
    ↓
  [Stage 2: Brand Validation]
    Facet search → Does "Nike" exist in catalog?
    ↓
  [Stage 3: Hybrid Search]
    Vector similarity + payload filters in Qdrant
    ↓
  [Stage 4: LLM Verification]
    GPT-4o-mini verifies results match intent
    ↓
  Top 5 verified results

This mode returns results with filteringStatus: "AI_FILTERED" and includes appliedFilters in the response metadata.

Choosing the Right Mode

Scenario	Recommended Mode	Why
Simple keyword queries	Simple Vector Search	Faster, no LLM overhead
Queries with price constraints	Agentic Search	Extracts price filters automatically
Brand-specific queries	Agentic Search	Validates brand against catalog
Exclusion queries ("not X")	Agentic Search	Understands negation
High-throughput applications	Simple Vector Search	Lower latency per request
AI agent interactions	Agentic Search	Richer, more accurate results

The Agentic LangGraph Workflow

The agentic search is built on LangGraph — a state machine framework for agentic LLM workflows with conditional edges and parallel execution. Each search request flows through four stages:

Stage 1: Parallel Filter Extraction

LangGraph runs three LLM calls in parallel using GPT-4o-mini, each extracting a different filter dimension:

// LangGraph parallel execution — all three run simultaneously:

// Branch 1: Category extraction
// "Nike shoes under $100 but not running"
// → categories: ["Shoes"], excludeCategories: ["Running"]

// Branch 2: Brand extraction  
// → brands: ["Nike"]

// Branch 3: Price extraction
// → priceRange: { max: 100, currency: "USD" }

// All three use Zod schemas for type-safe structured output parsing

Parallel execution means filter extraction takes roughly the same time as a single LLM call (~50-80ms), not three times longer.

What gets extracted:

Filter Type	Example Query	Extracted
Brand inclusion	"nike shoes"	`brands: ["Nike"]`
Brand exclusion	"laptops not apple"	`excludeBrands: ["Apple"]`
Price max	"under $100"	`priceRange.max: 100`
Price min	"over 500"	`priceRange.min: 500`
Price range	"between 200 and 500"	`priceRange: {min: 200, max: 500}`
Category	"gaming laptops"	`categories: ["Gaming"]`
Category exclusion	"shoes not running"	`excludeCategories: ["Running"]`

Stage 2: Brand Validation

Before searching, MCIP validates extracted brands against the actual store inventory. This prevents wasted searches and provides instant feedback:

// Query Qdrant facet search for available brands
const availableBrands = await qdrant.getFacetValues("brand");
// Returns: ["Nike", "Adidas", "Puma", "New Balance", ...]

// Validate extracted brand
if (!availableBrands.includes("Nike")) {
  // Brand not in catalog → return empty results immediately
  return { items: [], meta: { reason: "brand_unavailable" } };
}

This is a critical optimization — if a user asks for a brand the store doesn't carry, MCIP tells them immediately instead of returning irrelevant results.

Stage 3: Hybrid Search

With validated filters in hand, MCIP performs a hybrid search combining vector similarity with exact payload filtering:

This is a critical optimization — if a user asks for a brand the store doesn't carry, MCIP tells them immediately instead of returning irrelevant results.

### Stage 3: Hybrid Search

With validated filters in hand, MCIP performs a hybrid search combining vector similarity with exact payload filtering:

Qdrant applies payload filters before vector similarity ranking — this is more efficient than post-filtering because we never score products that can't be returned.

Stage 4: LLM Verification

The final stage passes search results through GPT-4o-mini for semantic verification:

// GPT-4o-mini verifies each result against original query intent
// "Do these results actually match 'Nike shoes under $100'?"

// Filters out false positives:
// - Products semantically similar but wrong category
// - Edge cases the payload filters missed
// - Results that technically match but aren't relevant

// Returns: Top 5 verified products with metadata

This extra verification step significantly improves result quality. The LLM catches edge cases that pure vector similarity and metadata filters miss.

The RAG Foundation: From Words to Understanding

What is RAG?

RAG (Retrieval-Augmented Generation) is the technique that makes MCIP understand meaning, not just match keywords. Instead of looking for products containing the word "comfortable," RAG finds products that are conceptually similar — including "ergonomic," "cushioned," "soft," and "cozy."

Both search modes build on the same RAG foundation: OpenAI's text-embedding-3-small model converts text into 1536-dimensional vectors that capture semantic meaning.

How Embeddings Work

These 1536 numbers capture the meaning of a query. "Shoes" and "footwear" generate similar vectors because they mean similar things — even though they share no characters:

// Product embedding strategy
// Products are converted to a searchable text blob before embedding:
const textToEmbed = `
  Title: ${product.title}
  Description: ${product.description}
  Keywords: ${product.keywords.join(", ")}
  Attributes: ${product.attributes.map(a => `${a.name}: ${a.value}`).join(", ")}
`;

// This combined text captures all searchable dimensions of a product

Hybrid Search: Why Both Vectors and Filters

Pure vector search is powerful but imprecise. Searching for "laptop under 500" might return a $1000 MacBook because it's semantically similar to laptops. Hybrid search combines the best of both:

Vector search: Understanding intent and meaning ("gaming laptop" → finds products about gaming computers)
Payload filters: Hard constraints that must be satisfied (price ≤ 500, brand = "Dell")

Qdrant handles this natively — pre-filtering eliminates non-matching products, then vector scoring ranks what remains.

Score Interpretation

Score	Meaning	Example
0.90+	Excellent match	Query and product are nearly identical concepts
0.75–0.90	Good match	Strong semantic relationship
0.60–0.75	Moderate match	Related but not precise
<0.60	Weak match	Tangentially related at best

Performance Benchmarks

Measured Performance (100 Concurrent Users)

Metric	P50	P95	P99
Embedding Generation	145ms	189ms	212ms
Vector Search	238ms	287ms	342ms
Total Response Time	421ms	498ms	587ms

Throughput: 1,247 requests/second
Relevance Accuracy: 85–90%
Concurrent Sessions: Unlimited (memory-bound, horizontally scalable)

Why Each Stage is Fast

Embedding Generation (~150ms P50)

We use text-embedding-3-small — excellent semantic understanding with fast inference. The 1536 dimensions capture meaning without the overhead of larger models.

Qdrant Vector Search (~238ms P50)

Qdrant uses HNSW (Hierarchical Navigable Small World) graphs for approximate nearest neighbor search. With payload indexes on price.amount (float), brand (keyword), and category (keyword), filter conditions are nearly instant.

LangGraph Parallel Execution

Stage 1 of the agentic workflow runs three LLM calls simultaneously via LangGraph. This means filter extraction adds roughly one LLM call's latency, not three.

Filtering Status Values

Every search response includes a filteringStatus field indicating how the query was processed:

Status	Mode	Description
`AI_FILTERED`	Agentic Search	LangGraph workflow extracted and applied filters successfully
`VECTOR_ONLY`	Simple Search	Pure vector similarity search, no filter extraction
`RAG_ONLY`	Agentic Search	Filter extraction attempted but fell back to pure vector search
`FALLBACK`	Either	Degraded mode (e.g., embedding API failure, keyword fallback)

Error Handling in Search

Graceful Degradation

MCIP doesn't fail silently, but it doesn't crash on errors either. Each error type has a specific fallback strategy:

Error Type	Handling Strategy	User Impact	Recovery
Embedding API failure	Fallback to keyword search	Degraded relevance	Automatic
Filter extraction failure	Fall back to pure vector search (`RAG_ONLY`)	No filters applied	Automatic
Qdrant timeout	Return cached results	Possibly stale data	Automatic
Brand not found	Return empty with reason	Immediate feedback	User action
Rate limiting	Queue and retry	Delayed response	Automatic with backoff

Common Search Error Codes

Error Code	Meaning	Solution
2001	Embedding generation failed	Check OpenAI API key and connectivity
2002	Qdrant connection lost	Verify Qdrant is running on port 6333
2003	No results found	Try broader search terms
2004	Filter extraction failed	Query proceeds without filters (RAG_ONLY)

Technology Stack for Search

Core Dependencies

Component	Package	Version	Purpose
LangGraph	`@langchain/langgraph`	^1.0.15	Agentic workflow state machines
LangChain Core	`@langchain/core`	^1.1.15	LLM abstractions and structured output
LangChain OpenAI	`@langchain/openai`	^1.2.2	OpenAI integration
OpenAI SDK	`openai`	^6.9.1	Embeddings (text-embedding-3-small)
Vector Database	`@qdrant/js-client-rest`	^1.16.0	Vector search + payload filtering
Schema Validation	`zod`	^3.25.76	Structured output parsing
MCP Protocol	`@rekog/mcp-nest`	^1.8.4	AI agent tool registration

LLM Models Used

Model	Purpose	Where
`text-embedding-3-small`	Generate 1536-dim semantic vectors	Both search modes
`GPT-4o-mini`	Filter extraction (Stage 1)	Agentic search
`GPT-4o-mini`	LLM verification (Stage 4)	Agentic search

Configuration

# Required for search
OPENAI_API_KEY=sk-your-key
QDRANT_URL=http://qdrant:6333

Try It Yourself

Test Simple Vector Search

# Direct vector similarity — fast, no LLM overhead
curl "http://localhost:8080/search?q=laptop"

# With pagination
curl "http://localhost:8080/search?q=headphones&take=20&skip=10"

Test Agentic Search

# Full LangGraph workflow with automatic filter extraction
curl "http://localhost:8080/hard-filtering/search?q=nike+shoes+under+100"

# Exclusion filters
curl "http://localhost:8080/hard-filtering/search?q=gaming+laptop+except+asus"

# Price range
curl "http://localhost:8080/hard-filtering/search?q=phones+between+300+and+600"

Inspect the Response

Look at meta.filteringStatus and meta.appliedFilters to see how the query was processed:

{
  "meta": {
    "count": 5,
    "take": 10,
    "skip": 0,
    "q": "nike shoes under 100",
    "filteringStatus": "AI_FILTERED",
    "appliedFilters": {
      "brand": ["Nike"],
      "priceRange": { "min": null, "max": 100, "currency": "UAH" }
    }
  },
  "items": [
    {
      "externalId": "prod_123",
      "title": "Nike Air Max 90",
      "price": { "amount": 89.99, "currency": "USD" },
      "brand": "Nike",
      "score": 0.892
    }
  ]
}

Future: Multi-Platform Search

Planned Enhancement: Search orchestration is designed to support parallel searches across multiple e-commerce platforms as part of MCIP's evolution toward a full Machine Customer protocol.

Future capabilities will extend the current pipeline:

Parallel Multi-Store Execution

Query arrives
    ↓
[Fork] Search multiple platforms simultaneously
    ├── Shopify adapter
    ├── WooCommerce adapter
    └── Custom APIs
    ↓
[Join] Aggregate results within time budget
    ↓
Merged, deduplicated, ranked results

Planned Features

Time budget management: 2500ms total, per-platform timeouts (1000–1500ms)
Partial results: Show results from fast platforms while waiting for slow ones
Circuit breakers: Automatically disable failing platforms
Cross-store deduplication: Merge identical products from different sources
Unified ranking: Score and rank results across all sources

These features build on the current RAG pipeline — the semantic understanding stays the same, we just add more data sources through the adapter pattern.

Menu

Get started

Architecture