Company Logo
MCIP

Store Registration

Register your store through environment configuration, trigger a BullMQ sync, and MCIP handles the rest — mapping products to a unified schema, generating embeddings, and storing vectors in Qdrant for semantic search.

The Registration Mental Model

Think of store registration like teaching a translator a new language. You provide the source (your store's API), the dictionary (the adapter/mapper), and the translator memorizes everything (vector embeddings in Qdrant). Once learned, any AI agent can ask questions in natural language and get answers from your catalog.

MCIP is the Machine Customer Interaction Protocol — a universal way for AI agents to interact with commerce. Product discovery through semantic search is the first capability, with cart management, checkout, and order tracking planned as the protocol evolves.

What happens when you register a store:

Your Store API → BullMQ Queue → Product Mapper → Vectorization Service → Qdrant
     ↓                ↓                ↓                  ↓              ↓
  Raw data      Async job       UnifiedProduct      1536-dim vectors   Searchable!

Part 1: Environment Configuration

Current Implementation: Single Store

MCIP currently supports single-store registration through environment variables. This keeps configuration simple and secure — no config files to manage or accidentally commit.

Required Environment Variables:

# .env

# Required: OpenAI for embeddings and AI features
OPENAI_API_KEY=sk-proj-your-openai-key

# Store Connection
SOURCE_URL=https://demo.vendure.io/shop-api
STORE_PROVIDER=VENDURE

# For GraphQL platforms (Vendure)
GRAPHQL_QUERY={products{items{id name slug description variants{id sku name price priceWithTax currencyCode stockLevel}facetValues{name facet{name}}collections{name slug}featuredAsset{preview source}}}}

# Optional: Bearer token for authenticated APIs
SOURCE_API_KEY=your-api-key-here

# Infrastructure (Docker defaults work out of the box)
REDIS_HOST=redis
QDRANT_URL=http://qdrant:6333
PORT=8080

# Security
ADMIN_API_KEY=your-secret-admin-key

# Optional: For product URL generation
STOREFRONT_URL=https://your-store.com

Store Provider Options

The STORE_PROVIDER variable determines which adapter processes your product data:

ProviderUse CaseData Source
VENDUREVendure e-commerce platformGraphQL API
CUSTOMAny other platformAI-powered mapping (GPT-4)

Vendure Example:

STORE_PROVIDER=VENDURE
SOURCE_URL=https://your-vendure.com/shop-api
GRAPHQL_QUERY={products{items{id name slug description variants{id sku name price}featuredAsset{preview source}}}}

Custom/REST API Example:

STORE_PROVIDER=CUSTOM
SOURCE_URL=https://api.your-store.com/products
SOURCE_API_KEY=your-bearer-token
# No GRAPHQL_QUERY needed — uses REST GET

💡 Tip: The CUSTOM provider uses GPT-4 to intelligently map any JSON structure to MCIP's unified schema. It's slower and uses API credits, but works with virtually any data format.


Part 2: The Ingestion Pipeline

Understanding the Data Flow

When you trigger a sync, MCIP processes your products through a robust async pipeline:

Stage 1: Fetch & Queue

  • POST /admin/sync fetches products from your SOURCE_URL
  • Each product is added to the BullMQ queue as a job
  • Jobs are persistent in Redis — survives restarts

Stage 2: Map to Unified Schema

  • The IngestionProcessor picks up jobs from the queue
  • Your configured mapper (VendureMapper or CustomAiMapper) transforms raw data
  • Output: UnifiedProduct validated by Zod schema

Stage 3: Generate Embeddings

  • VectorizationService creates searchable text from product fields
  • OpenAI's text-embedding-3-small generates 1536-dimensional vectors
  • These vectors capture semantic meaning, not just keywords

Stage 4: Store in Qdrant

  • Product + vector stored in Qdrant collection
  • Payload indexes created for hybrid filtering (brand, category, price)
  • Ready for both simple vector search and agentic filtered search

The Unified Product Schema

Every product from every platform gets normalized to this structure:

interface UnifiedProduct {
  externalId: string;      // Your platform's product ID
  url: string;             // Product page URL
  title: string;           // Product name (min 3 chars)
  description: string;     // Plain text, HTML stripped
  brand?: string;          // For filtering: "Nike", "Apple"
  category?: string;       // For filtering: "Shoes", "Laptops"
  price: {
    amount: number;        // e.g., 99.99
    currency: "UAH" | "USD" | "EUR";
  };
  mainImage: string;       // Primary image URL
  attributes: Array<{name: string; value: string | number | boolean}>;
  variants: Array<{sku: string; title: string; price: any; available: boolean}>;
  keywords: string[];      // 5-10 SEO terms for search
}

This normalization is what makes MCIP vendor-agnostic — AI agents work with one consistent schema regardless of the source platform.


Part 3: Triggering Product Sync

Initial Sync

Once your environment is configured, start MCIP and trigger the first sync:

# Start all services
docker-compose up -d

# Wait for services to be healthy (about 30 seconds)
sleep 30

# Verify MCIP is running
curl http://localhost:8080/health
# Expected: {"status":"ok"}

# Trigger product sync
curl -X POST http://localhost:8080/admin/sync \
  -H "x-admin-api-key: your-secret-admin-key"

Expected Response:

{
  "status": "success",
  "message": "Queued 150 products from URL",
  "count": 150
}

Checkpoint: You should see a count of products queued. If count: 0, check your SOURCE_URL and GRAPHQL_QUERY.

Monitoring Sync Progress

Watch the BullMQ processing in the logs:

# View MCIP logs
docker-compose logs -f mcip

# You'll see entries like:
# [IngestionProcessor] Processing product: Cool T-Shirt (prod_123)
# [VectorizationService] Generated embedding for: Cool T-Shirt
# [QdrantRepository] Saved product: prod_123

Rebuilding Indexes

If you need to recreate Qdrant payload indexes (after schema changes or to fix issues):

curl -X POST http://localhost:8080/admin/recreate-indexes \
  -H "x-admin-api-key: your-secret-admin-key"

# Response: {"message": "Indexes recreated successfully"}

Part 4: Verifying Registration

The fastest way to verify — pure semantic similarity:

curl "http://localhost:8080/search?q=comfortable+running+shoes"

Expected Response:

{
  "meta": {
    "count": 5,
    "take": 10,
    "skip": 0,
    "q": "comfortable running shoes",
    "filteringStatus": "RAG_ONLY"
  },
  "items": [
    {
      "externalId": "shoe-001",
      "title": "Nike Air Zoom Pegasus",
      "description": "Responsive cushioning for long runs",
      "brand": "Nike",
      "price": {"amount": 129.99, "currency": "USD"},
      "score": 0.847
    }
  ]
}

Checkpoint: You should see products with relevance scores. filteringStatus: "RAG_ONLY" means pure vector search was used.

MCIP's differentiator — LangGraph workflow with automatic filter extraction:

curl "http://localhost:8080/search?q=nike+shoes+under+100"

Expected Response:

{
  "meta": {
    "count": 3,
    "take": 10,
    "skip": 0,
    "q": "nike shoes under 100",
    "filteringStatus": "AI_FILTERED",
    "appliedFilters": {
      "brand": ["Nike"],
      "priceRange": {"min": null, "max": 100, "currency": "USD"}
    }
  },
  "items": [...]
}

Checkpoint: filteringStatus: "AI_FILTERED" confirms the LangGraph workflow extracted and applied filters from your natural language query.

What the Two Search Modes Mean

ModefilteringStatusUse Case
Simple VectorRAG_ONLYFast semantic similarity, no filter extraction
Agentic FilteredAI_FILTEREDComplex queries with brands, prices, categories

MCIP automatically chooses the best mode based on your query. Explicit filters ("Nike", "under $100") trigger the agentic workflow.


Part 5: Platform-Specific Configuration

Vendure (GraphQL)

STORE_PROVIDER=VENDURE
SOURCE_URL=https://your-vendure.com/shop-api
STOREFRONT_URL=https://your-store.com

# Full query with all useful fields
GRAPHQL_QUERY={products{items{id name slug description variants{id sku name price priceWithTax currencyCode stockLevel options{code name}assets{preview source}}facetValues{name facet{name}}collections{name slug}featuredAsset{preview source}assets{preview source}}}}

VendureMapper handles:

  • Price normalization (divides by 100 for cent-based prices)
  • Facet values → brand/category extraction
  • Asset URL transformation (internal → public URLs)
  • Variant mapping with stock levels

Shopify (REST) — Via Custom Mapper

STORE_PROVIDER=CUSTOM
SOURCE_URL=https://your-store.myshopify.com/admin/api/2024-01/products.json
SOURCE_API_KEY=shpat_xxxxxxxxxxxxx
STORE_CURRENCY=USD
STOREFRONT_URL=https://your-store.myshopify.com

Note: For Shopify, the CUSTOM provider uses AI-powered mapping. For production, consider creating a dedicated ShopifyMapper — see Create Adapters Guide.

Custom REST API

STORE_PROVIDER=CUSTOM
SOURCE_URL=https://api.your-platform.com/v1/products
SOURCE_API_KEY=your-bearer-token

The CustomAiMapper will:

  1. Fetch your JSON response
  2. Send each product to GPT-4 with the UnifiedProduct schema
  3. Intelligently extract and map fields
  4. Handle missing fields gracefully

Troubleshooting

"Queued 0 products from URL"

Cause: API returned empty or query failed silently

Solution:

# Test your SOURCE_URL directly
curl "$SOURCE_URL" -H "Authorization: Bearer $SOURCE_API_KEY" | head -100

# For GraphQL, test the query
curl -X POST "$SOURCE_URL" \
  -H "Content-Type: application/json" \
  -d '{"query": "'$GRAPHQL_QUERY'"}' | jq '.data.products.items | length'

"Invalid Admin API Key"

Cause: ADMIN_API_KEY mismatch

Solution:

# Check what's set in the container
docker-compose exec mcip printenv ADMIN_API_KEY

# Verify your request header matches
curl -X POST http://localhost:8080/admin/sync \
  -H "x-admin-api-key: $(cat .env | grep ADMIN_API_KEY | cut -d'=' -f2)"

"Cannot connect to Qdrant"

Cause: Qdrant not ready or wrong URL

Solution:

# Check Qdrant health
curl http://localhost:6333/collections

# Verify QDRANT_URL in container
docker-compose exec mcip printenv QDRANT_URL

# Check Qdrant logs
docker-compose logs qdrant

Quick Reference

TaskCommand
Check MCIP healthGET /health
Sync productsPOST /admin/sync with x-admin-api-key header
Rebuild indexesPOST /admin/recreate-indexes with x-admin-api-key header
Test simple searchGET /search?q=your+query
Test with filtersGET /search?q=brand+product+under+price
View Qdrant collectionsGET http://localhost:6333/collections
Check Redis queuedocker-compose exec redis redis-cli LLEN bull:product-ingestion:wait

Current vs Future Capabilities

✅ Currently Implemented

  • Single-store registration via environment variables
  • BullMQ async product ingestion with retry logic
  • VendureMapper and CustomAiMapper (AI-powered)
  • 1536-dimensional embeddings via OpenAI
  • Hybrid search in Qdrant (vector + payload filtering)
  • Two search modes: simple vector and agentic filtered
  • Admin sync and index management endpoints

🔮 Planned (Multi-Store Phase)

  • stores.yaml configuration file for multiple stores
  • Per-store health checks and automatic failover
  • Dynamic store enable/disable via API
  • Parallel search across multiple platforms
  • Store priority and weighting
  • Hot-reload configuration without restart