Getting started with Amazon Bedrock, RAG, and Vector database in Python
🎯 Learning Path: Explore → Build → Scale
| 🛠️ Component | 📝 What You'll Learn | ⏱️ Time | 📊 Level |
|---|---|---|---|
| 📓 Jupyter Notebooks | Multimodal AI fundamentals with interactive tutorials | 30-120 min | |
| 🗄️ Aurora PostgreSQL Vector Database | Vector database setup with pgvector extension | 15 min | |
| ⚡ Serverless Lambda Vector Database System | Multi-modal document processing with Lambda | 10 min | |
| 🎥 Ask Your Video: Audio/Video Processing Pipeline | Video analysis with ECS and vector search | 25 min |
| 📓 Notebook | 🎯 Focus & Key Learning | ⏱️ Time | 📊 Level | 🖼️ Diagram |
|---|---|---|---|---|
| 01 - Semantic Search with LangChain, Amazon Titan Embeddings, and FAISS | Text embeddings and PDF processing - Document chunking, embeddings generation, FAISS vector store operations | 30 min | ![]() |
|
| 02 - Building a Multimodal Image Search App with Titan Embeddings | Visual search capabilities - Image embeddings, multimodal search, natural language image queries | 45 min | ![]() |
|
| 03 - Supercharging Vector Similarity Search with Amazon Aurora and pgvector | Production database setup - PostgreSQL vector operations, pgvector extension, scalable similarity search | 60 min | ![]() |
|
| 04 - Video Understanding | Video content analysis - Nova models for video processing, content extraction, video understanding workflows | 45 min | ![]() |
|
| 05 - Video and Audio Content Analysis with Amazon Bedrock | Audio processing workflows - Transcription, audio embeddings, multimedia content analysis | 40 min | ![]() |
|
| 06 - Building Agentic Video RAG with Strands Agents - Local | AI agents for video analysis - Local agent implementation, memory-enhanced agents, persistent context storage | 90 min | ![]() |
|
| 07 - Building Agentic Video RAG with Strands Agents - Cloud | Production agent deployment - Cloud-based agent architecture, ECS deployment, scalable agent workflows | 120 min | ![]() |
| 🏗️ Application | 📝 Description | ⏱️ Deploy Time | 📊 Complexity | 🖼️ Diagram |
|---|---|---|---|---|
| 🗄️ Aurora PostgreSQL Vector Database | CDK stack for vector database setup | 15 min | ![]() |
|
| ⚡ Serverless Lambda Vector Database System | Multi-modal processing with Lambda functions | 10 min | ![]() |
|
| 🎥 Ask Your Video Processing Pipeline | ECS-based video analysis system | 25 min | ![]() |
Learn AWS's Advanced AI and Database Services
| 🔧 Service | 🎯 Purpose | ⚡ Key Capabilities |
|---|---|---|
| Amazon Bedrock | AI model access | Nova Multimodal Embeddings (amazon.nova-2-multimodal-embeddings-v1:0), Nova models for multimodal processing |
| Amazon Aurora PostgreSQL | Vector database | pgvector extension for similarity search operations |
| AWS Lambda | Serverless compute | Event-driven document and image processing |
| Amazon ECS | Container orchestration | Scalable video processing workflows |
| Amazon S3 | Object storage | Document, image, and video content storage |
| Amazon Transcribe | Speech-to-text | Audio content extraction from video files |
| AWS Step Functions | Workflow orchestration | Complex multi-step video processing |
| Amazon API Gateway | API management | RESTful endpoints for search operations |
This project uses amazon.nova-2-multimodal-embeddings-v1:0 as the embedding model. Nova uses a different API format than the previous Titan models — requests must use the taskType/singleEmbeddingParams structure instead of the old inputText/inputImage fields.
The BedrockEmbeddings class from langchain_community.embeddings internally uses the Titan API format and is not compatible with Amazon Nova Embeddings. Using it with amazon.nova-2-multimodal-embeddings-v1:0 will result in a ValidationException from Bedrock.
Avoid:
# ❌ Does NOT work with Nova
from langchain_community.embeddings import BedrockEmbeddings
embeddings = BedrockEmbeddings(model_id="amazon.nova-2-multimodal-embeddings-v1:0", client=bedrock_client)Use instead — direct boto3 with Nova format:
# ✅ Correct Nova API format
body = json.dumps({
"taskType": "SINGLE_EMBEDDING",
"singleEmbeddingParams": {
"embeddingPurpose": "GENERIC_INDEX",
"embeddingDimension": 1024,
"text": {"value": text, "truncationMode": "END"},
},
})
response = bedrock_runtime.invoke_model(
body=body,
modelId="amazon.nova-2-multimodal-embeddings-v1:0",
accept="application/json",
contentType="application/json",
)
embedding = json.loads(response["body"].read())["embedding"]Note: The Serverless Lambda module still uses
BedrockEmbeddingsfor legacy PDF/text workflows. If you extend those Lambdas with Nova embeddings, replaceBedrockEmbeddingswith direct boto3 calls using the format above.
| 💰 Service | 💵 Approximate Cost | 📊 Usage Pattern | 🔗 Pricing Link |
|---|---|---|---|
| Amazon Bedrock | ~$0.10 per 1K tokens | Text/image embeddings | Bedrock Pricing |
| Aurora PostgreSQL | ~$0.08/hour | Vector database operations | Aurora Pricing |
| AWS Lambda | ~$0.0001/request | API endpoint calls | Included in AWS Free Tier |
| Amazon S3 | ~$0.023/GB/month | Content storage | S3 Pricing |
| Amazon Transcribe | ~$0.024/minute | Audio processing | Transcribe Pricing |
💡 Start with notebooks for local development at no cost, then explore AWS services within Free Tier limits.
Before You Begin:
- AWS Account with Amazon Bedrock access enabled
- Python 3.8+ installed locally
- AWS CLI configured with appropriate permissions
- Docker installed (for container-based demos)
AWS Credentials Setup: Follow the AWS credentials configuration guide to configure your environment.
git clone https://github.com/build-on-aws/langchain-embeddings.git
cd langchain-embeddings/notebookscd create-aurora-pgvector
cdk deploycd serveless-embeddings
cdk deploycd container-video-embeddings
cdk deploy- Getting started with Amazon Bedrock, RAG, and Vector database in Python
- Building with Amazon Bedrock and LangChain Workshop
- How To Choose Your LLM
- Working With Your Live Data Using LangChain
⭐ Star this repository • 📖 Start Learning
This library is licensed under the MIT-0 License. See the LICENSE file for details.









