Company Logo
MCIP
Server

Error Handling

MCIP uses NestJS exception handling with graceful degradation built into the search and ingestion pipelines. The agentic LangGraph workflow handles failures at each stage, BullMQ retries failed ingestion jobs, and the system falls back to simpler search modes when external services are unavailable. Structured error codes are planned for multi-platform releases.

Error Response Format

All MCIP HTTP errors return a consistent JSON structure, powered by NestJS's built-in exception filters:

{
  "statusCode": 400,
  "message": "Error description here",
  "error": "Bad Request"
}

HTTP Status Codes

CodeMeaningWhen It Happens
200SuccessRequest completed successfully
400Bad RequestInvalid input, missing parameters, Zod validation failure
401UnauthorizedInvalid or missing API key for admin endpoints
404Not FoundResource doesn't exist
500Internal Server ErrorUnexpected server-side error (OpenAI failure, Qdrant timeout, etc.)

Example Error Scenarios

Missing search query:

curl "http://localhost:8080/search"

{
  "statusCode": 400,
  "message": "Query parameter 'q' is required",
  "error": "Bad Request"
}

curl -X POST http://localhost:8080/admin/sync \
  -H "x-admin-api-key: wrong-key"
  
{
  "statusCode": 401,
  "message": "Invalid Admin API Key",
  "error": "Unauthorized"
}

Missing SOURCE_URL for sync:

Search Pipeline Error Handling

MCIP provides two search modes, each with its own error behavior. Understanding this is key to building robust integrations.

Simple Vector Search (GET /search)

The simple search path — embedding generation followed by Qdrant vector similarity — has a straightforward failure model:

Failure PointWhat HappensResponse
OpenAI embedding API downSearch cannot proceed500 Internal Server Error
Qdrant unreachableVector search fails500 Internal Server Error
No matching resultsEmpty items array returned200 with items: []
Invalid query paramsZod validation rejects input400 Bad Request

Agentic Search (GET /hard-filtering/search)

The agentic search runs a 4-stage LangGraph workflow. Each stage can fail independently, and MCIP handles failures at each step:

Stage 1 — Parallel Filter Extraction (GPT-4o-mini)

Three LLM calls run in parallel to extract categories, brands, and price constraints. If any extraction call fails, that filter dimension is skipped — the search continues with whatever filters were successfully extracted.

Stage 2 — Brand Validation (Qdrant Facet Search)

Extracted brands are validated against the actual store catalog via Qdrant's getFacetValues("brand"). If the requested brand doesn't exist in the store, MCIP returns an empty result set immediately rather than wasting time on a search that can't succeed.

Stage 3 — Hybrid Search (Embedding + Payload Filtering)

If embedding generation fails at this stage, the search cannot proceed and returns a 500 error. If Qdrant's payload filtering encounters an issue with a specific filter field, the system may fall back to pure vector search without that filter.

Stage 4 — LLM Verification (GPT-4o-mini)

If the verification LLM call fails, MCIP returns the unverified results from Stage 3 rather than failing the entire request. You'll still get relevant products — they just won't have the extra semantic verification pass.

The filteringStatus Indicator

The search response meta.filteringStatus tells you how the search was actually processed:

StatusMeaning
AI_FILTEREDFull agentic workflow succeeded — filters extracted and applied
RAG_ONLYPure vector similarity search, no filter extraction applied
FALLBACKDegraded mode — something failed in the pipeline, results may be less precise

Always check this field to understand the quality of results you're receiving:

{
  "statusCode": 400,
  "message": "SOURCE_URL environment variable is not set",
  "error": "Bad Request"
}
const data = await response.json();

if (data.meta.filteringStatus === 'FALLBACK') {
  // Results are available but may be less precise
  console.warn('Search ran in degraded mode');
}

Ingestion Pipeline Errors

Product ingestion uses BullMQ with Redis for async job processing. This means ingestion errors don't surface as HTTP responses — they're handled within the queue.

BullMQ Retry Logic

Each product ingestion job is configured with automatic retries:

// Job configuration
{
  name: "process-product",
  data: rawProduct,
  opts: {
    removeOnComplete: true,
    attempts: 3          // Retry up to 3 times
  }
}

Ingestion Failure Points

StageFailureRecovery
Fetch from sourceSOURCE_URL unreachable or returns errorJob fails, retries up to 3 times
Product mappingAdapter throws (invalid data, missing fields)Job fails, product skipped after retries
Zod validationMapped product doesn't match UnifiedProduct schemaJob fails, product skipped
Embedding generationOpenAI API error or rate limitJob retries with BullMQ backoff
Qdrant storageVector DB unreachableJob retries, Qdrant auto-reconnects

Monitoring Ingestion

Check ingestion status through Docker logs:

# Watch ingestion processing
docker-compose logs -f mcip

# Look for mapper errors
docker-compose logs mcip | grep "ERROR"

The sync endpoint returns the number of products queued:

{
  "status": "success",
  "message": "Queued 150 products from URL",
  "count": 150
}

A successful sync response means products were queued, not necessarily processed. Individual products may still fail during mapping or embedding.


Graceful Degradation

MCIP is designed to return the best results it can, even when parts of the system are struggling. Think of it like a restaurant that still serves food when one burner is broken — you might not get the full menu, but you won't go hungry.

Current Degradation Strategies

Error TypeHandling StrategyUser ImpactRecovery
Embedding API failureFalls back to simpler search or returns errorDegraded relevance or no resultsAutomatic when API recovers
Vector DB timeoutReturns cached or partial results if availablePossibly stale dataAutomatic with retry
LLM filter extraction failureSkips failed filter, continues with othersSome filters not appliedAutomatic on next request
LLM verification failureReturns unverified results from hybrid searchResults not semantically verifiedAutomatic on next request
Brand not in catalogReturns empty results immediatelyNo results (by design)N/A — correct behavior
Rate limiting (OpenAI)Queue and retry with backoffDelayed responseAutomatic with exponential backoff

Qdrant Connection Resilience

MCIP retries the Qdrant connection up to 10 times on startup. If Qdrant is slow to start (common in Docker Compose), MCIP will wait:

# If you see this in logs, it's normal — MCIP is waiting for Qdrant
[Nest] WARN - Qdrant connection attempt 3/10...

The Docker Compose health checks ensure Qdrant is ready before MCIP starts accepting requests.


Handling Errors in Your Code

Basic Error Handling

try {
  const response = await fetch('http://localhost:8080/search?q=laptop');
  const data = await response.json();

  if (!response.ok) {
    console.error(`Error ${data.statusCode}: ${data.message}`);
    switch (data.statusCode) {
      case 400:
        // Invalid input — check your query parameters
        break;
      case 401:
        // Auth error — check your admin API key
        break;
      case 500:
        // Server error — retry with backoff
        break;
    }
    return;
  }

  // Check search quality
  if (data.meta.filteringStatus === 'FALLBACK') {
    console.warn('Results may be less precise (degraded mode)');
  }

  console.log(`Found ${data.meta.count} products`);
} catch (error) {
  // Network error — server unreachable
  console.error('Network error:', error.message);
}

Retry with Exponential Backoff

async function searchWithRetry(query, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch(
        `http://localhost:8080/search?q=${encodeURIComponent(query)}`
      );
      if (response.ok) return response.json();

      if (response.status >= 500) {
        // Server error — worth retrying
        const delay = Math.pow(2, i) * 1000;
        console.log(`Retry ${i + 1}/${maxRetries} in ${delay}ms...`);
        await new Promise(r => setTimeout(r, delay));
        continue;
      }

      // Client error (400, 401) — don't retry, fix the request
      const errorData = await response.json();
      throw new Error(`Client error ${response.status}: ${errorData.message}`);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
    }
  }
}

Handling Agentic Search Specifically

When using the agentic search endpoint, you may want to fall back to simple search if the LangGraph pipeline fails:

async function smartSearch(query) {
  try {
    // Try agentic search first (best results)
    const response = await fetch(
      `http://localhost:8080/hard-filtering/search?q=${encodeURIComponent(query)}`
    );
    if (response.ok) return response.json();
  } catch (error) {
    console.warn('Agentic search failed, falling back to simple search');
  }

  // Fallback to simple vector search
  const response = await fetch(
    `http://localhost:8080/search?q=${encodeURIComponent(query)}`
  );
  return response.json();
}

Monitoring and Debugging

Health Check

Always verify server health before operations:

curl http://localhost:8080/health

Expected response:

{"status":"ok"}

Common Issues

SymptomPossible CauseSolution
500 on all searchesOpenAI API key invalid or expiredVerify OPENAI_API_KEY environment variable
500 on syncQdrant unreachableCheck QDRANT_URL and run docker-compose logs qdrant
401 on admin endpointsWrong API keyVerify ADMIN_API_KEY matches the x-admin-api-key header
Empty search resultsProducts not synced yetRun POST /admin/sync and wait for ingestion to complete
Slow agentic searchOpenAI rate limitsCheck OpenAI dashboard for rate limit status, add delays if on Tier 1
Sync returns count but no searchable productsMapper errors during ingestionCheck docker-compose logs mcip for mapping/validation errors
MCIP won't startQdrant not ready yetWait for health checks — MCIP retries Qdrant connection 10 times
Queue jobs stuckRedis connection lostRun docker-compose restart redis

Checking Infrastructure Health

# MCIP server
curl http://localhost:8080/health

# Qdrant vector database
curl http://localhost:6333/collections

# Redis (via docker)
docker-compose exec redis redis-cli ping
# Expected: PONG

Partial Results Across Stores

When searching across multiple stores, return results from responsive stores even if some fail:

{
  "items": [...],
  "meta": {
    "partial": true,
    "failedStores": ["shopify-store-1"],
    "successfulStores": ["vendure-main", "woocommerce-shop"]
  }
}

Circuit Breakers

Automatic failure detection and recovery per store adapter:

  • Open circuit after 5 consecutive failures
  • Half-open state after 30 seconds
  • Auto-recovery on successful request

Per-Store Timeouts

  • Default timeout: 2500ms per store
  • Fast-fail threshold: 3000ms total
  • Return available results if timeout exceeded