Server

Error Handling

MCIP uses NestJS exception handling with graceful degradation built into the search and ingestion pipelines. The agentic LangGraph workflow handles failures at each stage, BullMQ retries failed ingestion jobs, and the system falls back to simpler search modes when external services are unavailable. Structured error codes are planned for multi-platform releases.

Error Response Format

All MCIP HTTP errors return a consistent JSON structure, powered by NestJS's built-in exception filters:

{
  "statusCode": 400,
  "message": "Error description here",
  "error": "Bad Request"
}

HTTP Status Codes

Code	Meaning	When It Happens
200	Success	Request completed successfully
400	Bad Request	Invalid input, missing parameters, Zod validation failure
401	Unauthorized	Invalid or missing API key for admin endpoints
404	Not Found	Resource doesn't exist
500	Internal Server Error	Unexpected server-side error (OpenAI failure, Qdrant timeout, etc.)

Example Error Scenarios

Missing search query:

curl "http://localhost:8080/search"

{
  "statusCode": 400,
  "message": "Query parameter 'q' is required",
  "error": "Bad Request"
}

curl -X POST http://localhost:8080/admin/sync \
  -H "x-admin-api-key: wrong-key"
  
{
  "statusCode": 401,
  "message": "Invalid Admin API Key",
  "error": "Unauthorized"
}

Missing SOURCE_URL for sync:

Search Pipeline Error Handling

MCIP provides two search modes, each with its own error behavior. Understanding this is key to building robust integrations.

Simple Vector Search (`GET /search`)

The simple search path — embedding generation followed by Qdrant vector similarity — has a straightforward failure model:

Failure Point	What Happens	Response
OpenAI embedding API down	Search cannot proceed	500 Internal Server Error
Qdrant unreachable	Vector search fails	500 Internal Server Error
No matching results	Empty items array returned	200 with `items: []`
Invalid query params	Zod validation rejects input	400 Bad Request

Agentic Search (`GET /hard-filtering/search`)

The agentic search runs a 4-stage LangGraph workflow. Each stage can fail independently, and MCIP handles failures at each step:

Stage 1 — Parallel Filter Extraction (GPT-4o-mini)

Three LLM calls run in parallel to extract categories, brands, and price constraints. If any extraction call fails, that filter dimension is skipped — the search continues with whatever filters were successfully extracted.

Stage 2 — Brand Validation (Qdrant Facet Search)

Extracted brands are validated against the actual store catalog via Qdrant's getFacetValues("brand"). If the requested brand doesn't exist in the store, MCIP returns an empty result set immediately rather than wasting time on a search that can't succeed.

Stage 3 — Hybrid Search (Embedding + Payload Filtering)

If embedding generation fails at this stage, the search cannot proceed and returns a 500 error. If Qdrant's payload filtering encounters an issue with a specific filter field, the system may fall back to pure vector search without that filter.

Stage 4 — LLM Verification (GPT-4o-mini)

If the verification LLM call fails, MCIP returns the unverified results from Stage 3 rather than failing the entire request. You'll still get relevant products — they just won't have the extra semantic verification pass.

The `filteringStatus` Indicator

The search response meta.filteringStatus tells you how the search was actually processed:

Status	Meaning
`AI_FILTERED`	Full agentic workflow succeeded — filters extracted and applied
`RAG_ONLY`	Pure vector similarity search, no filter extraction applied
`FALLBACK`	Degraded mode — something failed in the pipeline, results may be less precise

Always check this field to understand the quality of results you're receiving:

{
  "statusCode": 400,
  "message": "SOURCE_URL environment variable is not set",
  "error": "Bad Request"
}

const data = await response.json();

if (data.meta.filteringStatus === 'FALLBACK') {
  // Results are available but may be less precise
  console.warn('Search ran in degraded mode');
}

Ingestion Pipeline Errors

Product ingestion uses BullMQ with Redis for async job processing. This means ingestion errors don't surface as HTTP responses — they're handled within the queue.

BullMQ Retry Logic

Each product ingestion job is configured with automatic retries:

// Job configuration
{
  name: "process-product",
  data: rawProduct,
  opts: {
    removeOnComplete: true,
    attempts: 3          // Retry up to 3 times
  }
}

Ingestion Failure Points

Stage	Failure	Recovery
Fetch from source	`SOURCE_URL` unreachable or returns error	Job fails, retries up to 3 times
Product mapping	Adapter throws (invalid data, missing fields)	Job fails, product skipped after retries
Zod validation	Mapped product doesn't match `UnifiedProduct` schema	Job fails, product skipped
Embedding generation	OpenAI API error or rate limit	Job retries with BullMQ backoff
Qdrant storage	Vector DB unreachable	Job retries, Qdrant auto-reconnects

Monitoring Ingestion

Check ingestion status through Docker logs:

# Watch ingestion processing
docker-compose logs -f mcip

# Look for mapper errors
docker-compose logs mcip | grep "ERROR"

The sync endpoint returns the number of products queued:

{
  "status": "success",
  "message": "Queued 150 products from URL",
  "count": 150
}

A successful sync response means products were queued, not necessarily processed. Individual products may still fail during mapping or embedding.

Graceful Degradation

MCIP is designed to return the best results it can, even when parts of the system are struggling. Think of it like a restaurant that still serves food when one burner is broken — you might not get the full menu, but you won't go hungry.

Current Degradation Strategies

Error Type	Handling Strategy	User Impact	Recovery
Embedding API failure	Falls back to simpler search or returns error	Degraded relevance or no results	Automatic when API recovers
Vector DB timeout	Returns cached or partial results if available	Possibly stale data	Automatic with retry
LLM filter extraction failure	Skips failed filter, continues with others	Some filters not applied	Automatic on next request
LLM verification failure	Returns unverified results from hybrid search	Results not semantically verified	Automatic on next request
Brand not in catalog	Returns empty results immediately	No results (by design)	N/A — correct behavior
Rate limiting (OpenAI)	Queue and retry with backoff	Delayed response	Automatic with exponential backoff

Qdrant Connection Resilience

MCIP retries the Qdrant connection up to 10 times on startup. If Qdrant is slow to start (common in Docker Compose), MCIP will wait:

# If you see this in logs, it's normal — MCIP is waiting for Qdrant
[Nest] WARN - Qdrant connection attempt 3/10...

The Docker Compose health checks ensure Qdrant is ready before MCIP starts accepting requests.

Handling Errors in Your Code

Basic Error Handling

try {
  const response = await fetch('http://localhost:8080/search?q=laptop');
  const data = await response.json();

  if (!response.ok) {
    console.error(`Error ${data.statusCode}: ${data.message}`);
    switch (data.statusCode) {
      case 400:
        // Invalid input — check your query parameters
        break;
      case 401:
        // Auth error — check your admin API key
        break;
      case 500:
        // Server error — retry with backoff
        break;
    }
    return;
  }

  // Check search quality
  if (data.meta.filteringStatus === 'FALLBACK') {
    console.warn('Results may be less precise (degraded mode)');
  }

  console.log(`Found ${data.meta.count} products`);
} catch (error) {
  // Network error — server unreachable
  console.error('Network error:', error.message);
}

Retry with Exponential Backoff

async function searchWithRetry(query, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch(
        `http://localhost:8080/search?q=${encodeURIComponent(query)}`
      );
      if (response.ok) return response.json();

      if (response.status >= 500) {
        // Server error — worth retrying
        const delay = Math.pow(2, i) * 1000;
        console.log(`Retry ${i + 1}/${maxRetries} in ${delay}ms...`);
        await new Promise(r => setTimeout(r, delay));
        continue;
      }

      // Client error (400, 401) — don't retry, fix the request
      const errorData = await response.json();
      throw new Error(`Client error ${response.status}: ${errorData.message}`);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
    }
  }
}

Handling Agentic Search Specifically

When using the agentic search endpoint, you may want to fall back to simple search if the LangGraph pipeline fails:

async function smartSearch(query) {
  try {
    // Try agentic search first (best results)
    const response = await fetch(
      `http://localhost:8080/hard-filtering/search?q=${encodeURIComponent(query)}`
    );
    if (response.ok) return response.json();
  } catch (error) {
    console.warn('Agentic search failed, falling back to simple search');
  }

  // Fallback to simple vector search
  const response = await fetch(
    `http://localhost:8080/search?q=${encodeURIComponent(query)}`
  );
  return response.json();
}

Monitoring and Debugging

Health Check

Always verify server health before operations:

curl http://localhost:8080/health

Expected response:

{"status":"ok"}

Common Issues

Symptom	Possible Cause	Solution
500 on all searches	OpenAI API key invalid or expired	Verify `OPENAI_API_KEY` environment variable
500 on sync	Qdrant unreachable	Check `QDRANT_URL` and run `docker-compose logs qdrant`
401 on admin endpoints	Wrong API key	Verify `ADMIN_API_KEY` matches the `x-admin-api-key` header
Empty search results	Products not synced yet	Run `POST /admin/sync` and wait for ingestion to complete
Slow agentic search	OpenAI rate limits	Check OpenAI dashboard for rate limit status, add delays if on Tier 1
Sync returns count but no searchable products	Mapper errors during ingestion	Check `docker-compose logs mcip` for mapping/validation errors
MCIP won't start	Qdrant not ready yet	Wait for health checks — MCIP retries Qdrant connection 10 times
Queue jobs stuck	Redis connection lost	Run `docker-compose restart redis`

Checking Infrastructure Health

# MCIP server
curl http://localhost:8080/health

# Qdrant vector database
curl http://localhost:6333/collections

# Redis (via docker)
docker-compose exec redis redis-cli ping
# Expected: PONG

Partial Results Across Stores

When searching across multiple stores, return results from responsive stores even if some fail:

{
  "items": [...],
  "meta": {
    "partial": true,
    "failedStores": ["shopify-store-1"],
    "successfulStores": ["vendure-main", "woocommerce-shop"]
  }
}

Circuit Breakers

Automatic failure detection and recovery per store adapter:

Open circuit after 5 consecutive failures
Half-open state after 30 seconds
Auto-recovery on successful request

Per-Store Timeouts

Default timeout: 2500ms per store
Fast-fail threshold: 3000ms total
Return available results if timeout exceeded

Menu

Get started

Architecture

Server

Client

Guides

Error Handling

Error Response Format

HTTP Status Codes

Example Error Scenarios

Missing SOURCE_URL for sync:

Search Pipeline Error Handling

Simple Vector Search (`GET /search`)

Agentic Search (`GET /hard-filtering/search`)

The `filteringStatus` Indicator

Ingestion Pipeline Errors

BullMQ Retry Logic

Ingestion Failure Points

Monitoring Ingestion

Graceful Degradation

Current Degradation Strategies

Qdrant Connection Resilience

Handling Errors in Your Code

Basic Error Handling

Retry with Exponential Backoff

Handling Agentic Search Specifically

Monitoring and Debugging

Health Check

Common Issues

Checking Infrastructure Health

Partial Results Across Stores

Circuit Breakers

Per-Store Timeouts

Get started

Architecture

Server

Client

Guides

Error Handling

Error Response Format

HTTP Status Codes

Example Error Scenarios

Missing SOURCE_URL for sync:

Search Pipeline Error Handling

Simple Vector Search (GET /search)

Agentic Search (GET /hard-filtering/search)

The filteringStatus Indicator

Ingestion Pipeline Errors

BullMQ Retry Logic

Ingestion Failure Points

Monitoring Ingestion

Graceful Degradation

Current Degradation Strategies

Qdrant Connection Resilience

Handling Errors in Your Code

Basic Error Handling

Retry with Exponential Backoff

Handling Agentic Search Specifically

Monitoring and Debugging

Health Check

Common Issues

Checking Infrastructure Health

Partial Results Across Stores

Circuit Breakers

Per-Store Timeouts

Simple Vector Search (`GET /search`)

Agentic Search (`GET /hard-filtering/search`)

The `filteringStatus` Indicator