Home Goods & FurnitureData Pipeline + AI Agent10 weeks

AI-Powered Catalog Quality & Enrichment

95%

Attribute Completeness95% attribute completeness achieved

Context

Client Stack + Scale

A home goods brand selling furniture and home decor through Shopify Plus. The catalog includes 12,000 products across multiple categories (furniture, lighting, textiles, accessories) with complex attribute requirements.

Revenue Band

$20M–$35M annually

SKU Count

12,000 active products

Order Volume

400–600 orders/day

Team Size

3-person merchandising team

Tech Stack

ShopifyPythonOpenAIPostgreSQLAirflow

The Problem

What was breaking

The merchandising team was drowning in incomplete product data. Of the 12,000 SKUs in the catalog, roughly 40% were missing critical metafields like materials, dimensions, care instructions, or room type tags. This incomplete data created multiple problems: customers couldn't effectively filter products by material or dimensions, SEO performance was poor (missing structured data for Google Shopping), and return rates were higher than industry benchmarks (customers received products that didn't match their expectations due to vague descriptions). The team tried to manually enrich products, but at a rate of 30 products per day, it would take over 1 year to complete the backlog – and new products were being added faster than they could enrich existing ones. The root cause: product data came from 3 different suppliers with inconsistent data quality, and the team lacked tools to enforce data standards at scale.

40% of products missing critical metafields (materials, dimensions, care instructions)

Manual enrichment pace (30 products/day) couldn't keep up with new product additions

Customer complaints about product descriptions not matching actual items

23% return rate (vs. 15% industry benchmark) due to expectation mismatches

Poor SEO performance and low Google Shopping ad quality scores

The Solution

Architecture + Approach

We built an AI-powered catalog enrichment pipeline that automatically scores product data quality, generates missing attributes using vision and language models, enforces data standards, and provides a workflow for human review and approval.

Architecture Overview

Nightly catalog quality audit scoring every product on 15 dimensions

OpenAI GPT-4V (vision) model to extract attributes from product images

OpenAI GPT-4 (text) model to generate descriptions and categorize products

PostgreSQL-backed enrichment queue prioritized by quality score and business impact

Airflow orchestration for scheduled jobs and retry logic

Shopify admin extension for human review and bulk approval of AI-generated attributes

Technical Details

The pipeline operates in three stages: audit, enrich, and review. The audit stage runs nightly, scoring each product on a 0–100 scale across 15 quality dimensions (title quality, description completeness, image count, metafield completeness, variant consistency, etc.). Products below 70 are added to the enrichment queue, prioritized by business impact (high-traffic products and new arrivals ranked higher). The enrichment stage processes queued products using AI models: GPT-4V analyzes product images to extract visual attributes (material, color, style), GPT-4 generates missing descriptions and normalizes variant names, and a rules engine fills dimensions and care instructions based on product type. All AI-generated content is stored in a staging table with confidence scores. The review stage surfaces enriched products in a custom Shopify admin extension where merchandisers can review, edit, and bulk-approve changes. Approved changes are pushed to Shopify via Admin API with full audit trail. The system learns from human edits to improve future suggestions.

The Results

Measurable Impact

95%

Attribute Completeness

Percentage of products with all required metafields populated, up from 60% pre-launch

400/day

Enrichment Rate

Products enriched per day (13x improvement over manual pace of 30/day)

18%

Return Rate

Reduced from 23% to 18% over 6 months post-launch (closer to 15% industry benchmark)

34%

SEO Traffic Increase

Organic search traffic increased due to improved structured data and richer product descriptions

Additional Outcomes

Merchandising team shifted focus from data entry to strategic category curation

Google Shopping ad quality scores improved from 5.2 to 7.8 average

Customer service tickets related to product confusion decreased by 41%

New product launch process now includes automated quality checks before going live

Get similar results

Tell us what systems you're connecting and what's breaking. We'll respond with a plan and timeline.

Book a call