TL;DR: In e-commerce, search for agents = dense (semantic) + sparse (lexical) + business rules—all with low latency, strict filters, and freshness SLAs. The winning stack is usually hybrid: BM25/SPLADE for precision, ANN for semantics, reranking for final ordering, and policy/rules for merchandising. Pick infra by (1) latency & scale needs, (2) filtering complexity, (3) ops model, (4) cost predictability, and (5) multi-tenant security.
1) What your agent actually needs (beyond “semantic search”)
E-commerce agent tasks demand:
Lexical precision (SKU codes, sizes, “blue denim 32x30”).
Semantic generalization (“sporty shoes for monsoon”).
Hard filters (price bands, stock, brand, region, seller).
Personalization signals (affinities, recency, margin goals).
Freshness (price/stock/indexing within minutes).
Explainability (why ranked? which facets? grounded snippets).
A practical pipeline:
Query → Normalize/expand → (A) Sparse retriever (BM25/SPLADE) (B) Dense retriever (ANN: HNSW/IVF-PQ) → Merge (RRF/weighted) → Cross-encoder rerank (top 200→50→20) → Business rules & filters → Personalization blend → Results
2) Your main platform choices
Option A — Elasticsearch/OpenSearch (hybrid-in-one)
Pros: mature ops, BM25 + kNN (HNSW) in one cluster, aggregations/facets, filters at search-time, rollups, tiered storage.
Cons: dense scale can stress heap/I/O; ANN + complex filters + heavy aggregations may push latency; PQ/quantization choices limited vs specialized engines.
Option B — Dedicated Vector DB + Elasticsearch
Pros: vector DB optimized for ANN (HNSW/IVF/IVF-PQ/ScaNN/DiskANN), strong payload filtering (e.g., Qdrant/Milvus), ES for facets/analytics/logging.
Cons: two systems to run; result fusion and consistency work; cross-index freshness orchestration.
Option C — Vector DB with native hybrid
Qdrant/Weaviate/Pinecone support hybrid (dense + sparse) and metadata filters.
Pros: simpler fusion; often serverless options (predictable cost), good ANN performance.
Cons: fewer “analytics” features vs ES; maturity varies by feature (ACLs, fine-grained multi-tenancy, cross-region replication).
Option D — pgvector (Postgres) + ES
Pros: low ops for moderate scale; transactional consistency.
Cons: not ideal for billion-scale ANN or P95 < 100 ms at high QPS.
3) Decision checklist (use this first)
Latency at peak: required P95 (e.g., ≤ 150 ms for query→cards)?
Catalog size & QPS: SKUs, variants, locales; read/write ratio; QPS targets.
Filter complexity: deep nested filters, ACLs, region/seller partitions.
Freshness SLA: price/stock updates in seconds vs minutes.
Personalization blend: per-user reranking online vs batch-scored.
Multi-tenant & data residency: brand stores/marketplaces; region pinning.
Ops model: managed/serverless vs self-hosted; on-call tolerance.
Cost model: $/1k queries, storage, egress; vector dim & quantization.
If you need hard filters + facets + analytics with decent ANN at medium scale, Elasticsearch/OpenSearch hybrid is often enough.
If you need very low latency at high scale with complex filters and constant updates, Vector DB + ES is typically superior.
4) Retrieval & ranking blueprint (copy-ready)
Indexing
Sparse index: BM25 or SPLADE features for product titles, attributes, reviews.
Dense index: multilingual semantic embeddings (titles + attributes + curated descriptions).
Metadata: brand, price, stock, seller, region, category, facets, safety flags.
Quantization: IVF-PQ/OPQ for memory; calibrate recall vs latency.
Sharding: by category or region to localize latency and cache hit rates.
Query-time
q' = normalize(query) # spell, synonyms, units, localeS = sparse_search(q', topK=1000, filters=hard_filters)D = dense_search(q', topK=800, filters=hard_filters) # ANN (efSearch tuned)M = merge(S, D, method=RRF or weighted_sum(alpha*sparse + beta*dense))C = cross_encoder_rerank(q', M.top(200)) # miniLM/E5-CE; cap to 50R = apply_business_rules(C) # margin, promotions, diversity, cold-start boostreturn R.top(N)
Scoring
Start with RRF (Reciprocal Rank Fusion) for robustness.
Move to weighted (learn α/β via offline LTR).
Add diversity constraint (brand/category caps) to avoid monocultures.
5) E-commerce specifics that make or break quality
Variant intelligence: unify color/size variants; avoid duplicate hits; prefer in-stock variants in rerank.
Attribute normalization: units, colors, materials; shared taxonomies.
Synonyms & locales: “sneakers/trainers/sports shoes”; Hindi/English; transliteration.
Cold-start: blend collaborative boosts from similar items, not just popularity.
Safety & policy filters: banned terms, age-gated items, region restrictions.
Explainability: expose why ranked (matched term, attribute, or similar style).
Freshness path: near-real-time indexing for price/stock deltas (CDC → stream → index).
6) Evaluation you can operate (offline + online)
Offline relevance
Recall@k, NDCG@k, MRR by intent slice (head, torso, tail queries).
Facet accuracy (filter integrity, variant de-dup).
Latency & throughput under replay (P50/P95/P99).
Online
CTR@1/5/10, Add-to-Cart rate, Conversion uplift, abandonment.
Latency SLOs, infra cost per 1k queries.
Guardrails: unsafe exposure rate = 0; OOS click-through < threshold.
Slice dashboards: category, price band, locale, device, cold-start items.
7) Reference comparison (pragmatic view)
Capability | Elasticsearch/OpenSearch (hybrid-in-one) | Pinecone | Milvus | Qdrant | Weaviate |
---|---|---|---|---|---|
Sparse (BM25) | Native | Lib/adapter | Adapter | Sparse + dense | BM25 + dense |
ANN perf/scale | Good (HNSW) | Excellent (managed) | Excellent | Very good | Very good |
Filters/Facets | Excellent | Metadata filters | Strong filters | Strong filters | Good filters |
Freshness | Very good | Good | Good | Good | Good |
Analytics/Aggr | Rich | Limited | Limited | Limited | Limited |
Ops model | Self-managed/cloud | Managed/serverless | Self-hosted/cloud | Self-hosted/cloud | Cloud/self |
Best for | One-stop hybrid + analytics | Low-ops high scale ANN | Self-hosted big ANN | Hybrid + filters | Developer-friendly hybrid |
For marketplaces with heavy filtering/faceting and analytics, keep ES/OpenSearch in the stack even if you add a vector DB.
8) Hybrid scoring that survives production
RRF to start; resilient to score calibration.
Move to alphasparse + betadense + gamma*rules, learn α/β with offline LTR using human-labeled or synthetic judgments.
Cross-encoder rerank only on a small candidate set (keep P95 stable).
Personalization blend as a final, bounded adjustment (avoid filter leakage).
9) Ops & cost controls
Quantization (IVF-PQ/OPQ) to cut RAM by 4–16×; validate recall drop.
Hot/warm tiering (seasonal catalogs, archival).
efConstruction/efSearch (HNSW) and nprobe (IVF) sweeps—record recall vs latency Pareto.
Regional sharding to minimize egress/RTT.
Autoscaling on QPS; cap cross-encoder use; cache top queries and embeddings.
10) Anti-patterns
Only dense search: you’ll miss exact needs (“iPhone 13 128GB midnight”).
Only lexical: you’ll fail vague/natural queries and long-tail discovery.
Heavy reranking on full candidate sets: kills P95 and cost.
No freshness SLA for stock/price: UX regression > any relevance gain.
Mixing personalization into initial retrieval: breaks filter guarantees.
11) 30/60/90 adoption plan
0–30 days
Baseline ES/OpenSearch BM25 + facets; synonyms & normalization.
Add vector embeddings; enable hybrid retrieval with RRF; evaluate Recall@100, NDCG@10.
Establish freshness pipeline (CDC → index) and latency dashboard.
31–60 days
Introduce cross-encoder rerank for top-200; tune ANN params for P95.
Implement business rules (diversity, margin, promo).
A/B test hybrid vs lexical; track CTR/Add-to-Cart/Conversion.
61–90 days
Add personalization blend (affinity vectors or bandits).
Consider vector DB for ANN at scale; keep ES for filters/analytics.
Quantization rollout; cost/1k query guardrails in CI; multi-region replication.
12) Minimal API contract (for your agent)
POST /search{ "query": "waterproof running shoes", "filters": {"size":["8","9"], "price":{"lte":4000}, "in_stock":true, "region":"IN"}, "user": {"id":"u123", "affinity":["running","outdoor"]}, "k": 20, "debug": false}
Response fields: items, scores (sparse/dense/rerank), applied_filters, explanations (matched attributes), citations (for RAG/grounded answers).
Closing
For GenAI agents in e-commerce, hybrid search isn’t optional—it’s the operating principle. Start simple (BM25 + ANN + RRF), layer reranking and rules, then scale with a vector DB when latency, filtering, or volume push beyond a single engine. Keep freshness, filters, and explainability sacred; measure NDCG/Recall, P95, and cost/1k queries on every change.