Intent-aware hybrid search for e-commerce using Qdrant, IBM Granite embeddings, and Gemini 3.1 Flash Lite.
This project demonstrates why blindly applying Reciprocal Rank Fusion (RRF) can hurt search relevance, and how dynamically weighting retrieval sources based on query intent produces better results. Inspired by Doug Turnbull's "RRF is Not Enough".
Read the full write-up: Don't Just Fuse, Think First: Intent-Driven Hybrid Search with Qdrant
Standard hybrid search combines dense (semantic) and sparse (BM25) results with equal-weight RRF. This works well when both sources agree, but when they disagree, irrelevant BM25 results can drag down good dense results:
Query: "I need something to block out airplane noise on long flights"
Dense Search (top 3): BM25 Search (top 3):
1. Sony WH-1000XM5 Headphones 1. Nike Air Max 90 Running Shoes
2. Bose QuietComfort Ultra Earbuds 2. Dyson Airwrap Multi-Styler
3. Apple AirPods Pro 3. Sony WH-1000XM5 Headphones
Naive RRF (equal weights):
1. Sony WH-1000XM5 Headphones ✓
2. Nike Air Max 90 Running Shoes <<< irrelevant
3. Bose QuietComfort Ultra Earbuds ✓
4. Dyson Airwrap Multi-Styler <<< irrelevant
5. Apple AirPods Pro ✓
Use an LLM (Gemini 3.1 Flash Lite) to classify query intent and dynamically adjust RRF weights:
Query: "I need something to block out airplane noise on long flights"
Intent: semantic --> dense_weight=0.8, bm25_weight=0.2
Intent-Aware Weighted RRF:
1. Sony WH-1000XM5 Headphones ✓
2. Bose QuietComfort Ultra Earbuds ✓
3. Apple AirPods Pro ✓
4. Dyson Airwrap Multi-Styler
5. Sonos Era 300 Speaker
User Query
|
v
[Gemini 3.1 Flash Lite] --> intent + weights + phrase
|
v
+-----------------+ +-----------------------------+
| Dense Search | | BM25 Search |
| (Granite R2) | | + Phrase Filter (if needed) |
+-----------------+ +-----------------------------+
| |
+----> Weighted RRF <-------+
|
v
Ranked Results
- Qdrant 1.17.0 -- Vector database (Docker)
- IBM Granite
granite-embedding-small-english-r2-- Dense embeddings (384-dim) - Qdrant's BM25 -- Sparse/lexical embeddings (via FastEmbed)
- Qdrant Phrase Search -- Full-text index with
phrase_matchingfor high-precision filtering - Qdrant Weighted RRF -- Dynamic fusion weights per query
- Gemini 3.1 Flash Lite -- Structured intent classification
git clone https://github.com/gururaser/qdrant-intent-aware-hybrid-search.git
cd qdrant-intent-aware-hybrid-searchCopy the example file and add your Gemini API key:
cp .env.example .envEdit .env and replace your_api_key_here with your actual API key. You can get one for free at Google AI Studio.
docker run -d --name qdrant-hybrid \
-p 6333:6333 -p 6334:6334 \
-v qdrant_data:/qdrant/storage \
qdrant/qdrant:v1.17.0pip install qdrant-client sentence-transformers google-genai fastembed pandas tabulatejupyter notebook hybrid_search_ecommerce.ipynbMake sure to load your .env file in the notebook. The first cell with os.environ.get("GOOGLE_API_KEY") will pick it up if you run:
from dotenv import load_dotenv
load_dotenv()Or set the variable directly in your terminal before launching Jupyter:
export GOOGLE_API_KEY="your_api_key_here"
jupyter notebook hybrid_search_ecommerce.ipynbqdrant-intent-aware-hybrid-search/
├── hybrid_search_ecommerce.ipynb # Full runnable notebook
├── .env.example # Template for environment variables
├── .gitignore
└── README.md
The LLM classifies each query into one of four intents:
Intent Dense BM25 When
semantic 0.8 0.2 User describes a need in natural language
keyword_lookup 0.2 0.8 User searches for a specific product/brand/model
hybrid 0.5 0.5 Mix of descriptive language and specific terms
phrase_match 0.4 0.6 User references a specific feature phrase
For phrase_match intents, the LLM also extracts the key phrase, which is used as a Qdrant MatchPhrase filter on the BM25 prefetch for higher precision.
Weighted RRF -- Assign different importance to each retrieval source:
query=models.RrfQuery(
rrf=models.Rrf(weights=[bm25_weight, dense_weight])
)Phrase Search -- Match exact multi-word phrases, not just individual tokens:
models.FieldCondition(
key="text",
match=models.MatchPhrase(phrase="noise cancelling"),
)Native BM25 -- Server-side sparse embedding with IDF modifier:
sparse_vectors_config={
"bm25": models.SparseVectorParams(modifier=models.Modifier.IDF)
}- RRF is Not Enough -- Doug Turnbull
- Qdrant Hybrid Queries
- Qdrant Phrase Search
- IBM Granite Embedding Models
- Gemini 3.1 Flash Lite