MCIP provides two complementary search modes: a fast simple vector search for straightforward queries and an agentic LangGraph workflow for complex natural language queries. The agentic pipeline runs a 4-stage state machine — parallel filter extraction, brand validation via facet search, hybrid vector search in Qdrant, and LLM verification — delivering precise, filtered results in under 500ms.
Product discovery is the first operational module of the Machine Customer Interaction Protocol. MCIP's long-term vision is a universal commerce protocol covering the full lifecycle — search, cart, checkout, and order tracking. Search orchestration is where it all starts: when an AI agent asks "find me Nike shoes under $100," MCIP transforms that intent into precise, filtered product results from any connected e-commerce platform.
MCIP doesn't just match keywords — it understands meaning. The search system combines semantic vector embeddings with structured filter extraction to find products that truly match what users want, even when they use different words.
MCIP provides two complementary search endpoints, each optimized for different scenarios:
Endpoint: GET /search?q={query}&take={limit}&skip={offset}
Service: SearchService
Best for: Straightforward queries where speed matters most
Simple vector search goes directly from query to results with no LLM calls in the search path:
Query: "gaming laptop"
↓
[Embedding Generation]
↓
1536-dimensional vector (~150ms)
↓
[Qdrant Vector Search]
↓
Cosine similarity ranking (~250ms)
↓
Ranked results with relevance scoresThis mode is fast and low-latency — pure embedding generation plus vector similarity search. No LLM overhead, no filter extraction. It returns results with filteringStatus: "VECTOR_ONLY".
Endpoint: GET /hard-filtering/search?q={query}&take={limit}&skip={offset}
Service: HardFilteringService
Best for: Complex natural language queries with implicit filters
This is MCIP's core differentiator — a 4-stage LangGraph state machine that autonomously understands queries, extracts filters, validates brands, performs hybrid search, and verifies results:
Query: "Nike shoes under $100 but not running"
↓
[Stage 1: Parallel Filter Extraction]
├── Extract categories → "Shoes" (excl. "Running")
├── Extract brands → "Nike"
└── Extract price → max $100
↓
[Stage 2: Brand Validation]
Facet search → Does "Nike" exist in catalog?
↓
[Stage 3: Hybrid Search]
Vector similarity + payload filters in Qdrant
↓
[Stage 4: LLM Verification]
GPT-4o-mini verifies results match intent
↓
Top 5 verified resultsThis mode returns results with filteringStatus: "AI_FILTERED" and includes appliedFilters in the response metadata.
| Scenario | Recommended Mode | Why |
|---|---|---|
| Simple keyword queries | Simple Vector Search | Faster, no LLM overhead |
| Queries with price constraints | Agentic Search | Extracts price filters automatically |
| Brand-specific queries | Agentic Search | Validates brand against catalog |
| Exclusion queries ("not X") | Agentic Search | Understands negation |
| High-throughput applications | Simple Vector Search | Lower latency per request |
| AI agent interactions | Agentic Search | Richer, more accurate results |
The agentic search is built on LangGraph — a state machine framework for agentic LLM workflows with conditional edges and parallel execution. Each search request flows through four stages:
LangGraph runs three LLM calls in parallel using GPT-4o-mini, each extracting a different filter dimension:
// LangGraph parallel execution — all three run simultaneously:
// Branch 1: Category extraction
// "Nike shoes under $100 but not running"
// → categories: ["Shoes"], excludeCategories: ["Running"]
// Branch 2: Brand extraction
// → brands: ["Nike"]
// Branch 3: Price extraction
// → priceRange: { max: 100, currency: "USD" }
// All three use Zod schemas for type-safe structured output parsingParallel execution means filter extraction takes roughly the same time as a single LLM call (~50-80ms), not three times longer.
What gets extracted:
| Filter Type | Example Query | Extracted |
|---|---|---|
| Brand inclusion | "nike shoes" | brands: ["Nike"] |
| Brand exclusion | "laptops not apple" | excludeBrands: ["Apple"] |
| Price max | "under $100" | priceRange.max: 100 |
| Price min | "over 500" | priceRange.min: 500 |
| Price range | "between 200 and 500" | priceRange: {min: 200, max: 500} |
| Category | "gaming laptops" | categories: ["Gaming"] |
| Category exclusion | "shoes not running" | excludeCategories: ["Running"] |
Before searching, MCIP validates extracted brands against the actual store inventory. This prevents wasted searches and provides instant feedback:
// Query Qdrant facet search for available brands
const availableBrands = await qdrant.getFacetValues("brand");
// Returns: ["Nike", "Adidas", "Puma", "New Balance", ...]
// Validate extracted brand
if (!availableBrands.includes("Nike")) {
// Brand not in catalog → return empty results immediately
return { items: [], meta: { reason: "brand_unavailable" } };
}This is a critical optimization — if a user asks for a brand the store doesn't carry, MCIP tells them immediately instead of returning irrelevant results.
With validated filters in hand, MCIP performs a hybrid search combining vector similarity with exact payload filtering:
This is a critical optimization — if a user asks for a brand the store doesn't carry, MCIP tells them immediately instead of returning irrelevant results.
### Stage 3: Hybrid Search
With validated filters in hand, MCIP performs a hybrid search combining vector similarity with exact payload filtering:Qdrant applies payload filters before vector similarity ranking — this is more efficient than post-filtering because we never score products that can't be returned.
The final stage passes search results through GPT-4o-mini for semantic verification:
// GPT-4o-mini verifies each result against original query intent
// "Do these results actually match 'Nike shoes under $100'?"
// Filters out false positives:
// - Products semantically similar but wrong category
// - Edge cases the payload filters missed
// - Results that technically match but aren't relevant
// Returns: Top 5 verified products with metadataThis extra verification step significantly improves result quality. The LLM catches edge cases that pure vector similarity and metadata filters miss.
RAG (Retrieval-Augmented Generation) is the technique that makes MCIP understand meaning, not just match keywords. Instead of looking for products containing the word "comfortable," RAG finds products that are conceptually similar — including "ergonomic," "cushioned," "soft," and "cozy."
Both search modes build on the same RAG foundation: OpenAI's text-embedding-3-small model converts text into 1536-dimensional vectors that capture semantic meaning.
These 1536 numbers capture the meaning of a query. "Shoes" and "footwear" generate similar vectors because they mean similar things — even though they share no characters:
// Product embedding strategy
// Products are converted to a searchable text blob before embedding:
const textToEmbed = `
Title: ${product.title}
Description: ${product.description}
Keywords: ${product.keywords.join(", ")}
Attributes: ${product.attributes.map(a => `${a.name}: ${a.value}`).join(", ")}
`;
// This combined text captures all searchable dimensions of a productPure vector search is powerful but imprecise. Searching for "laptop under 500" might return a $1000 MacBook because it's semantically similar to laptops. Hybrid search combines the best of both:
Qdrant handles this natively — pre-filtering eliminates non-matching products, then vector scoring ranks what remains.
| Score | Meaning | Example |
|---|---|---|
| 0.90+ | Excellent match | Query and product are nearly identical concepts |
| 0.75–0.90 | Good match | Strong semantic relationship |
| 0.60–0.75 | Moderate match | Related but not precise |
| <0.60 | Weak match | Tangentially related at best |
| Metric | P50 | P95 | P99 |
|---|---|---|---|
| Embedding Generation | 145ms | 189ms | 212ms |
| Vector Search | 238ms | 287ms | 342ms |
| Total Response Time | 421ms | 498ms | 587ms |
Embedding Generation (~150ms P50)
We use text-embedding-3-small — excellent semantic understanding with fast inference. The 1536 dimensions capture meaning without the overhead of larger models.
Qdrant Vector Search (~238ms P50)
Qdrant uses HNSW (Hierarchical Navigable Small World) graphs for approximate nearest neighbor search. With payload indexes on price.amount (float), brand (keyword), and category (keyword), filter conditions are nearly instant.
LangGraph Parallel Execution
Stage 1 of the agentic workflow runs three LLM calls simultaneously via LangGraph. This means filter extraction adds roughly one LLM call's latency, not three.
Every search response includes a filteringStatus field indicating how the query was processed:
| Status | Mode | Description |
|---|---|---|
AI_FILTERED | Agentic Search | LangGraph workflow extracted and applied filters successfully |
VECTOR_ONLY | Simple Search | Pure vector similarity search, no filter extraction |
RAG_ONLY | Agentic Search | Filter extraction attempted but fell back to pure vector search |
FALLBACK | Either | Degraded mode (e.g., embedding API failure, keyword fallback) |
MCIP doesn't fail silently, but it doesn't crash on errors either. Each error type has a specific fallback strategy:
| Error Type | Handling Strategy | User Impact | Recovery |
|---|---|---|---|
| Embedding API failure | Fallback to keyword search | Degraded relevance | Automatic |
| Filter extraction failure | Fall back to pure vector search (RAG_ONLY) | No filters applied | Automatic |
| Qdrant timeout | Return cached results | Possibly stale data | Automatic |
| Brand not found | Return empty with reason | Immediate feedback | User action |
| Rate limiting | Queue and retry | Delayed response | Automatic with backoff |
| Error Code | Meaning | Solution |
|---|---|---|
| 2001 | Embedding generation failed | Check OpenAI API key and connectivity |
| 2002 | Qdrant connection lost | Verify Qdrant is running on port 6333 |
| 2003 | No results found | Try broader search terms |
| 2004 | Filter extraction failed | Query proceeds without filters (RAG_ONLY) |
| Component | Package | Version | Purpose |
|---|---|---|---|
| LangGraph | @langchain/langgraph | ^1.0.15 | Agentic workflow state machines |
| LangChain Core | @langchain/core | ^1.1.15 | LLM abstractions and structured output |
| LangChain OpenAI | @langchain/openai | ^1.2.2 | OpenAI integration |
| OpenAI SDK | openai | ^6.9.1 | Embeddings (text-embedding-3-small) |
| Vector Database | @qdrant/js-client-rest | ^1.16.0 | Vector search + payload filtering |
| Schema Validation | zod | ^3.25.76 | Structured output parsing |
| MCP Protocol | @rekog/mcp-nest | ^1.8.4 | AI agent tool registration |
| Model | Purpose | Where |
|---|---|---|
text-embedding-3-small | Generate 1536-dim semantic vectors | Both search modes |
GPT-4o-mini | Filter extraction (Stage 1) | Agentic search |
GPT-4o-mini | LLM verification (Stage 4) | Agentic search |
# Required for search
OPENAI_API_KEY=sk-your-key
QDRANT_URL=http://qdrant:6333# Direct vector similarity — fast, no LLM overhead
curl "http://localhost:8080/search?q=laptop"
# With pagination
curl "http://localhost:8080/search?q=headphones&take=20&skip=10"# Full LangGraph workflow with automatic filter extraction
curl "http://localhost:8080/hard-filtering/search?q=nike+shoes+under+100"
# Exclusion filters
curl "http://localhost:8080/hard-filtering/search?q=gaming+laptop+except+asus"
# Price range
curl "http://localhost:8080/hard-filtering/search?q=phones+between+300+and+600"Look at meta.filteringStatus and meta.appliedFilters to see how the query was processed:
{
"meta": {
"count": 5,
"take": 10,
"skip": 0,
"q": "nike shoes under 100",
"filteringStatus": "AI_FILTERED",
"appliedFilters": {
"brand": ["Nike"],
"priceRange": { "min": null, "max": 100, "currency": "UAH" }
}
},
"items": [
{
"externalId": "prod_123",
"title": "Nike Air Max 90",
"price": { "amount": 89.99, "currency": "USD" },
"brand": "Nike",
"score": 0.892
}
]
}Planned Enhancement: Search orchestration is designed to support parallel searches across multiple e-commerce platforms as part of MCIP's evolution toward a full Machine Customer protocol.
Future capabilities will extend the current pipeline:
Query arrives
↓
[Fork] Search multiple platforms simultaneously
├── Shopify adapter
├── WooCommerce adapter
└── Custom APIs
↓
[Join] Aggregate results within time budget
↓
Merged, deduplicated, ranked resultsThese features build on the current RAG pipeline — the semantic understanding stays the same, we just add more data sources through the adapter pattern.