Store Registration

Register your store through environment configuration, trigger a BullMQ sync, and MCIP handles the rest — mapping products to a unified schema, generating embeddings, and storing vectors in Qdrant for semantic search.

The Registration Mental Model

Think of store registration like teaching a translator a new language. You provide the source (your store's API), the dictionary (the adapter/mapper), and the translator memorizes everything (vector embeddings in Qdrant). Once learned, any AI agent can ask questions in natural language and get answers from your catalog.

MCIP is the Machine Customer Interaction Protocol — a universal way for AI agents to interact with commerce. Product discovery through semantic search is the first capability, with cart management, checkout, and order tracking planned as the protocol evolves.

What happens when you register a store:

Your Store API → BullMQ Queue → Product Mapper → Vectorization Service → Qdrant
     ↓                ↓                ↓                  ↓              ↓
  Raw data      Async job       UnifiedProduct      1536-dim vectors   Searchable!

Part 1: Environment Configuration

Current Implementation: Single Store

MCIP currently supports single-store registration through environment variables. This keeps configuration simple and secure — no config files to manage or accidentally commit.

Required Environment Variables:

# .env

# Required: OpenAI for embeddings and AI features
OPENAI_API_KEY=sk-proj-your-openai-key

# Store Connection
SOURCE_URL=https://demo.vendure.io/shop-api
STORE_PROVIDER=VENDURE

# For GraphQL platforms (Vendure)
GRAPHQL_QUERY={products{items{id name slug description variants{id sku name price priceWithTax currencyCode stockLevel}facetValues{name facet{name}}collections{name slug}featuredAsset{preview source}}}}

# Optional: Bearer token for authenticated APIs
SOURCE_API_KEY=your-api-key-here

# Infrastructure (Docker defaults work out of the box)
REDIS_HOST=redis
QDRANT_URL=http://qdrant:6333
PORT=8080

# Security
ADMIN_API_KEY=your-secret-admin-key

# Optional: For product URL generation
STOREFRONT_URL=https://your-store.com

Store Provider Options

The STORE_PROVIDER variable determines which adapter processes your product data:

Provider	Use Case	Data Source
`VENDURE`	Vendure e-commerce platform	GraphQL API
`CUSTOM`	Any other platform	AI-powered mapping (GPT-4)

Vendure Example:

STORE_PROVIDER=VENDURE
SOURCE_URL=https://your-vendure.com/shop-api
GRAPHQL_QUERY={products{items{id name slug description variants{id sku name price}featuredAsset{preview source}}}}

Custom/REST API Example:

STORE_PROVIDER=CUSTOM
SOURCE_URL=https://api.your-store.com/products
SOURCE_API_KEY=your-bearer-token
# No GRAPHQL_QUERY needed — uses REST GET

💡 Tip: The CUSTOM provider uses GPT-4 to intelligently map any JSON structure to MCIP's unified schema. It's slower and uses API credits, but works with virtually any data format.

Part 2: The Ingestion Pipeline

Understanding the Data Flow

When you trigger a sync, MCIP processes your products through a robust async pipeline:

Stage 1: Fetch & Queue

POST /admin/sync fetches products from your SOURCE_URL
Each product is added to the BullMQ queue as a job
Jobs are persistent in Redis — survives restarts

Stage 2: Map to Unified Schema

The IngestionProcessor picks up jobs from the queue
Your configured mapper (VendureMapper or CustomAiMapper) transforms raw data
Output: UnifiedProduct validated by Zod schema

Stage 3: Generate Embeddings

VectorizationService creates searchable text from product fields
OpenAI's text-embedding-3-small generates 1536-dimensional vectors
These vectors capture semantic meaning, not just keywords

Stage 4: Store in Qdrant

Product + vector stored in Qdrant collection
Payload indexes created for hybrid filtering (brand, category, price)
Ready for both simple vector search and agentic filtered search

The Unified Product Schema

Every product from every platform gets normalized to this structure:

interface UnifiedProduct {
  externalId: string;      // Your platform's product ID
  url: string;             // Product page URL
  title: string;           // Product name (min 3 chars)
  description: string;     // Plain text, HTML stripped
  brand?: string;          // For filtering: "Nike", "Apple"
  category?: string;       // For filtering: "Shoes", "Laptops"
  price: {
    amount: number;        // e.g., 99.99
    currency: "UAH" | "USD" | "EUR";
  };
  mainImage: string;       // Primary image URL
  attributes: Array<{name: string; value: string | number | boolean}>;
  variants: Array<{sku: string; title: string; price: any; available: boolean}>;
  keywords: string[];      // 5-10 SEO terms for search
}

This normalization is what makes MCIP vendor-agnostic — AI agents work with one consistent schema regardless of the source platform.

Part 3: Triggering Product Sync

Initial Sync

Once your environment is configured, start MCIP and trigger the first sync:

# Start all services
docker-compose up -d

# Wait for services to be healthy (about 30 seconds)
sleep 30

# Verify MCIP is running
curl http://localhost:8080/health
# Expected: {"status":"ok"}

# Trigger product sync
curl -X POST http://localhost:8080/admin/sync \
  -H "x-admin-api-key: your-secret-admin-key"

Expected Response:

{
  "status": "success",
  "message": "Queued 150 products from URL",
  "count": 150
}

✅ Checkpoint: You should see a count of products queued. If count: 0, check your SOURCE_URL and GRAPHQL_QUERY.

Monitoring Sync Progress

Watch the BullMQ processing in the logs:

# View MCIP logs
docker-compose logs -f mcip

# You'll see entries like:
# [IngestionProcessor] Processing product: Cool T-Shirt (prod_123)
# [VectorizationService] Generated embedding for: Cool T-Shirt
# [QdrantRepository] Saved product: prod_123

Rebuilding Indexes

If you need to recreate Qdrant payload indexes (after schema changes or to fix issues):

curl -X POST http://localhost:8080/admin/recreate-indexes \
  -H "x-admin-api-key: your-secret-admin-key"

# Response: {"message": "Indexes recreated successfully"}

Part 4: Verifying Registration

Test Simple Vector Search

The fastest way to verify — pure semantic similarity:

curl "http://localhost:8080/search?q=comfortable+running+shoes"

Expected Response:

{
  "meta": {
    "count": 5,
    "take": 10,
    "skip": 0,
    "q": "comfortable running shoes",
    "filteringStatus": "RAG_ONLY"
  },
  "items": [
    {
      "externalId": "shoe-001",
      "title": "Nike Air Zoom Pegasus",
      "description": "Responsive cushioning for long runs",
      "brand": "Nike",
      "price": {"amount": 129.99, "currency": "USD"},
      "score": 0.847
    }
  ]
}

✅ Checkpoint: You should see products with relevance scores. filteringStatus: "RAG_ONLY" means pure vector search was used.

Test Agentic Filtered Search

MCIP's differentiator — LangGraph workflow with automatic filter extraction:

curl "http://localhost:8080/search?q=nike+shoes+under+100"

Expected Response:

{
  "meta": {
    "count": 3,
    "take": 10,
    "skip": 0,
    "q": "nike shoes under 100",
    "filteringStatus": "AI_FILTERED",
    "appliedFilters": {
      "brand": ["Nike"],
      "priceRange": {"min": null, "max": 100, "currency": "USD"}
    }
  },
  "items": [...]
}

✅ Checkpoint: filteringStatus: "AI_FILTERED" confirms the LangGraph workflow extracted and applied filters from your natural language query.

What the Two Search Modes Mean

Mode	filteringStatus	Use Case
Simple Vector	`RAG_ONLY`	Fast semantic similarity, no filter extraction
Agentic Filtered	`AI_FILTERED`	Complex queries with brands, prices, categories

MCIP automatically chooses the best mode based on your query. Explicit filters ("Nike", "under $100") trigger the agentic workflow.

Part 5: Platform-Specific Configuration

Vendure (GraphQL)

STORE_PROVIDER=VENDURE
SOURCE_URL=https://your-vendure.com/shop-api
STOREFRONT_URL=https://your-store.com

# Full query with all useful fields
GRAPHQL_QUERY={products{items{id name slug description variants{id sku name price priceWithTax currencyCode stockLevel options{code name}assets{preview source}}facetValues{name facet{name}}collections{name slug}featuredAsset{preview source}assets{preview source}}}}

VendureMapper handles:

Price normalization (divides by 100 for cent-based prices)
Facet values → brand/category extraction
Asset URL transformation (internal → public URLs)
Variant mapping with stock levels

Shopify (REST) — Via Custom Mapper

STORE_PROVIDER=CUSTOM
SOURCE_URL=https://your-store.myshopify.com/admin/api/2024-01/products.json
SOURCE_API_KEY=shpat_xxxxxxxxxxxxx
STORE_CURRENCY=USD
STOREFRONT_URL=https://your-store.myshopify.com

Note: For Shopify, the CUSTOM provider uses AI-powered mapping. For production, consider creating a dedicated ShopifyMapper — see Create Adapters Guide.

Custom REST API

STORE_PROVIDER=CUSTOM
SOURCE_URL=https://api.your-platform.com/v1/products
SOURCE_API_KEY=your-bearer-token

The CustomAiMapper will:

Fetch your JSON response
Send each product to GPT-4 with the UnifiedProduct schema
Intelligently extract and map fields
Handle missing fields gracefully

Troubleshooting

"Queued 0 products from URL"

Cause: API returned empty or query failed silently

Solution:

# Test your SOURCE_URL directly
curl "$SOURCE_URL" -H "Authorization: Bearer $SOURCE_API_KEY" | head -100

# For GraphQL, test the query
curl -X POST "$SOURCE_URL" \
  -H "Content-Type: application/json" \
  -d '{"query": "'$GRAPHQL_QUERY'"}' | jq '.data.products.items | length'

"Invalid Admin API Key"

Cause: ADMIN_API_KEY mismatch

Solution:

# Check what's set in the container
docker-compose exec mcip printenv ADMIN_API_KEY

# Verify your request header matches
curl -X POST http://localhost:8080/admin/sync \
  -H "x-admin-api-key: $(cat .env | grep ADMIN_API_KEY | cut -d'=' -f2)"

"Cannot connect to Qdrant"

Cause: Qdrant not ready or wrong URL

Solution:

# Check Qdrant health
curl http://localhost:6333/collections

# Verify QDRANT_URL in container
docker-compose exec mcip printenv QDRANT_URL

# Check Qdrant logs
docker-compose logs qdrant

Quick Reference

Task	Command
Check MCIP health	`GET /health`
Sync products	`POST /admin/sync` with `x-admin-api-key` header
Rebuild indexes	`POST /admin/recreate-indexes` with `x-admin-api-key` header
Test simple search	`GET /search?q=your+query`
Test with filters	`GET /search?q=brand+product+under+price`
View Qdrant collections	`GET http://localhost:6333/collections`
Check Redis queue	`docker-compose exec redis redis-cli LLEN bull:product-ingestion:wait`

Current vs Future Capabilities

✅ Currently Implemented

Single-store registration via environment variables
BullMQ async product ingestion with retry logic
VendureMapper and CustomAiMapper (AI-powered)
1536-dimensional embeddings via OpenAI
Hybrid search in Qdrant (vector + payload filtering)
Two search modes: simple vector and agentic filtered
Admin sync and index management endpoints

🔮 Planned (Multi-Store Phase)

stores.yaml configuration file for multiple stores
Per-store health checks and automatic failover
Dynamic store enable/disable via API
Parallel search across multiple platforms
Store priority and weighting
Hot-reload configuration without restart

Menu

Get started

Architecture

Server

Client

Guides

Store Registration

The Registration Mental Model

Part 1: Environment Configuration

Current Implementation: Single Store

Store Provider Options

Part 2: The Ingestion Pipeline

Understanding the Data Flow

The Unified Product Schema

Part 3: Triggering Product Sync

Initial Sync

Monitoring Sync Progress

Rebuilding Indexes

Part 4: Verifying Registration

Test Simple Vector Search

Test Agentic Filtered Search

What the Two Search Modes Mean

Part 5: Platform-Specific Configuration

Vendure (GraphQL)

Shopify (REST) — Via Custom Mapper

Custom REST API

Troubleshooting

"Queued 0 products from URL"

"Invalid Admin API Key"

"Cannot connect to Qdrant"

Quick Reference

Current vs Future Capabilities

✅ Currently Implemented

🔮 Planned (Multi-Store Phase)