GitHub - build-on-aws/langchain-embeddings: This repository demonstrates the construction of a state-of-the-art multimodal search engine, leveraging Amazon Titan Embeddings, Amazon Bedrock, and LangChain.

🚀 Multimodal Search Learning Experience

Getting started with Amazon Bedrock, RAG, and Vector database in Python

Search across text, images, and video content with natural language queries

⭐ Star this repository

🎯 Learning Path: Explore → Build → Scale

🛠️ Component	📝 What You'll Learn	⏱️ Time
📓 Jupyter Notebooks	Multimodal AI fundamentals with interactive tutorials	30-120 min
🗄️ Aurora PostgreSQL Vector Database	Vector database setup with pgvector extension	15 min
⚡ Serverless Lambda Vector Database System	Multi-modal document processing with Lambda	10 min
🎥 Ask Your Video: Audio/Video Processing Pipeline	Video analysis with ECS and vector search	25 min

📓 Learning Notebooks

📓 Notebook	🎯 Focus & Key Learning	⏱️ Time
01 - Semantic Search with LangChain, Amazon Titan Embeddings, and FAISS	Text embeddings and PDF processing - Document chunking, embeddings generation, FAISS vector store operations	30 min
02 - Building a Multimodal Image Search App with Titan Embeddings	Visual search capabilities - Image embeddings, multimodal search, natural language image queries	45 min
03 - Supercharging Vector Similarity Search with Amazon Aurora and pgvector	Production database setup - PostgreSQL vector operations, pgvector extension, scalable similarity search	60 min
04 - Video Understanding	Video content analysis - Nova models for video processing, content extraction, video understanding workflows	45 min
05 - Video and Audio Content Analysis with Amazon Bedrock	Audio processing workflows - Transcription, audio embeddings, multimedia content analysis	40 min
06 - Building Agentic Video RAG with Strands Agents - Local	AI agents for video analysis - Local agent implementation, memory-enhanced agents, persistent context storage	90 min
07 - Building Agentic Video RAG with Strands Agents - Cloud	Production agent deployment - Cloud-based agent architecture, ECS deployment, scalable agent workflows	120 min

☁️ Demo Applications

🏗️ Application	📝 Description	⏱️ Deploy Time
🗄️ Aurora PostgreSQL Vector Database	CDK stack for vector database setup	15 min
⚡ Serverless Lambda Vector Database System	Multi-modal processing with Lambda functions	10 min
🎥 Ask Your Video Processing Pipeline	ECS-based video analysis system	25 min

🔧 AWS Services Demonstrated

Learn AWS's Advanced AI and Database Services

🔧 Service	🎯 Purpose	⚡ Key Capabilities
Amazon Bedrock	AI model access	Nova Multimodal Embeddings (`amazon.nova-2-multimodal-embeddings-v1:0`), Nova models for multimodal processing
Amazon Aurora PostgreSQL	Vector database	pgvector extension for similarity search operations
AWS Lambda	Serverless compute	Event-driven document and image processing
Amazon ECS	Container orchestration	Scalable video processing workflows
Amazon S3	Object storage	Document, image, and video content storage
Amazon Transcribe	Speech-to-text	Audio content extraction from video files
AWS Step Functions	Workflow orchestration	Complex multi-step video processing
Amazon API Gateway	API management	RESTful endpoints for search operations

⚠️ Compatibility Notes

Embedding Model: Amazon Nova Multimodal Embeddings

This project uses amazon.nova-2-multimodal-embeddings-v1:0 as the embedding model. Nova uses a different API format than the previous Titan models — requests must use the taskType/singleEmbeddingParams structure instead of the old inputText/inputImage fields.

Do not use `BedrockEmbeddings` from LangChain for Nova

The BedrockEmbeddings class from langchain_community.embeddings internally uses the Titan API format and is not compatible with Amazon Nova Embeddings. Using it with amazon.nova-2-multimodal-embeddings-v1:0 will result in a ValidationException from Bedrock.

Avoid:

# ❌ Does NOT work with Nova
from langchain_community.embeddings import BedrockEmbeddings
embeddings = BedrockEmbeddings(model_id="amazon.nova-2-multimodal-embeddings-v1:0", client=bedrock_client)

Use instead — direct boto3 with Nova format:

# ✅ Correct Nova API format
body = json.dumps({
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": 1024,
        "text": {"value": text, "truncationMode": "END"},
    },
})
response = bedrock_runtime.invoke_model(
    body=body,
    modelId="amazon.nova-2-multimodal-embeddings-v1:0",
    accept="application/json",
    contentType="application/json",
)
embedding = json.loads(response["body"].read())["embedding"]

Note: The Serverless Lambda module still uses BedrockEmbeddings for legacy PDF/text workflows. If you extend those Lambdas with Nova embeddings, replace BedrockEmbeddings with direct boto3 calls using the format above.

💰 Cost Estimation

💰 Service	💵 Approximate Cost	📊 Usage Pattern	🔗 Pricing Link
Amazon Bedrock	~$0.10 per 1K tokens	Text/image embeddings	Bedrock Pricing
Aurora PostgreSQL	~$0.08/hour	Vector database operations	Aurora Pricing
AWS Lambda	~$0.0001/request	API endpoint calls	Included in AWS Free Tier
Amazon S3	~$0.023/GB/month	Content storage	S3 Pricing
Amazon Transcribe	~$0.024/minute	Audio processing	Transcribe Pricing

💡 Start with notebooks for local development at no cost, then explore AWS services within Free Tier limits.

🎯 Prerequisites

Before You Begin:

AWS Account with Amazon Bedrock access enabled
Python 3.8+ installed locally
AWS CLI configured with appropriate permissions
Docker installed (for container-based demos)

AWS Credentials Setup: Follow the AWS credentials configuration guide to configure your environment.

🚀 Quick Start Guide

1. Start Learning

git clone https://github.com/build-on-aws/langchain-embeddings.git
cd langchain-embeddings/notebooks

2. Deploy Vector Database (15 minutes)

cd create-aurora-pgvector
cdk deploy

3. Build Serverless APIs (10 minutes)

cd serveless-embeddings
cdk deploy

4. Scale with Containers (25 minutes)

cd container-video-embeddings
cdk deploy

📚 Additional Learning Resources

⭐ Star this repository • 📖 Start Learning

🇻🇪🇨🇱 ¡Gracias!

Dev.to Linkedin GitHub Twitter Instagram Youtube Linktr

📄 License

This library is licensed under the MIT-0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
container-video-embeddings		container-video-embeddings
create-aurora-pgvector		create-aurora-pgvector
imagens		imagens
notebooks		notebooks
serveless-embeddings		serveless-embeddings
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Multimodal Search Learning Experience

📓 Learning Notebooks

☁️ Demo Applications

🔧 AWS Services Demonstrated

⚠️ Compatibility Notes

Embedding Model: Amazon Nova Multimodal Embeddings

Do not use `BedrockEmbeddings` from LangChain for Nova

💰 Cost Estimation

🎯 Prerequisites

🚀 Quick Start Guide

1. Start Learning

2. Deploy Vector Database (15 minutes)

3. Build Serverless APIs (10 minutes)

4. Scale with Containers (25 minutes)

📚 Additional Learning Resources

🇻🇪🇨🇱 ¡Gracias!

Dev.to Linkedin GitHub Twitter Instagram Youtube Linktr

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Multimodal Search Learning Experience

📓 Learning Notebooks

☁️ Demo Applications

🔧 AWS Services Demonstrated

⚠️ Compatibility Notes

Embedding Model: Amazon Nova Multimodal Embeddings

Do not use BedrockEmbeddings from LangChain for Nova

💰 Cost Estimation

🎯 Prerequisites

🚀 Quick Start Guide

1. Start Learning

2. Deploy Vector Database (15 minutes)

3. Build Serverless APIs (10 minutes)

4. Scale with Containers (25 minutes)

📚 Additional Learning Resources

🇻🇪🇨🇱 ¡Gracias!

Dev.to Linkedin GitHub Twitter Instagram Youtube Linktr

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Do not use `BedrockEmbeddings` from LangChain for Nova

Packages