AI-Powered Catalog Quality & Enrichment
Context
Client Stack + Scale
A home goods brand selling furniture and home decor through Shopify Plus. The catalog includes 12,000 products across multiple categories (furniture, lighting, textiles, accessories) with complex attribute requirements.
The Problem
What was breaking
The merchandising team was drowning in incomplete product data. Of the 12,000 SKUs in the catalog, roughly 40% were missing critical metafields like materials, dimensions, care instructions, or room type tags. This incomplete data created multiple problems: customers couldn't effectively filter products by material or dimensions, SEO performance was poor (missing structured data for Google Shopping), and return rates were higher than industry benchmarks (customers received products that didn't match their expectations due to vague descriptions). The team tried to manually enrich products, but at a rate of 30 products per day, it would take over 1 year to complete the backlog – and new products were being added faster than they could enrich existing ones. The root cause: product data came from 3 different suppliers with inconsistent data quality, and the team lacked tools to enforce data standards at scale.
40% of products missing critical metafields (materials, dimensions, care instructions)
Manual enrichment pace (30 products/day) couldn't keep up with new product additions
Customer complaints about product descriptions not matching actual items
23% return rate (vs. 15% industry benchmark) due to expectation mismatches
Poor SEO performance and low Google Shopping ad quality scores
The Solution
Architecture + Approach
We built an AI-powered catalog enrichment pipeline that automatically scores product data quality, generates missing attributes using vision and language models, enforces data standards, and provides a workflow for human review and approval.
Architecture Overview
Nightly catalog quality audit scoring every product on 15 dimensions
OpenAI GPT-4V (vision) model to extract attributes from product images
OpenAI GPT-4 (text) model to generate descriptions and categorize products
PostgreSQL-backed enrichment queue prioritized by quality score and business impact
Airflow orchestration for scheduled jobs and retry logic
Shopify admin extension for human review and bulk approval of AI-generated attributes
Technical Details
The pipeline operates in three stages: audit, enrich, and review. The audit stage runs nightly, scoring each product on a 0–100 scale across 15 quality dimensions (title quality, description completeness, image count, metafield completeness, variant consistency, etc.). Products below 70 are added to the enrichment queue, prioritized by business impact (high-traffic products and new arrivals ranked higher). The enrichment stage processes queued products using AI models: GPT-4V analyzes product images to extract visual attributes (material, color, style), GPT-4 generates missing descriptions and normalizes variant names, and a rules engine fills dimensions and care instructions based on product type. All AI-generated content is stored in a staging table with confidence scores. The review stage surfaces enriched products in a custom Shopify admin extension where merchandisers can review, edit, and bulk-approve changes. Approved changes are pushed to Shopify via Admin API with full audit trail. The system learns from human edits to improve future suggestions.
The Results
Measurable Impact
Percentage of products with all required metafields populated, up from 60% pre-launch
Products enriched per day (13x improvement over manual pace of 30/day)
Reduced from 23% to 18% over 6 months post-launch (closer to 15% industry benchmark)
Organic search traffic increased due to improved structured data and richer product descriptions
Additional Outcomes
Merchandising team shifted focus from data entry to strategic category curation
Google Shopping ad quality scores improved from 5.2 to 7.8 average
Customer service tickets related to product confusion decreased by 41%
New product launch process now includes automated quality checks before going live
Get similar results
Tell us what systems you're connecting and what's breaking. We'll respond with a plan and timeline.