Fluid Server integrates LanceDB as its primary vector database solution for storing and retrieving high-dimensional embeddings. LanceDB provides a modern, embedded vector database specifically designed for AI applications with native multimodal support and superior performance characteristics.
LanceDB offers comprehensive client libraries including full .NET support, making it ideal for Windows desktop applications that need to integrate with C# and .NET frameworks. Chroma only provides a client-side solution for .NET environments.
LanceDB supports multimodal embeddings (text, image, audio) natively without requiring additional configuration or separate collections. This allows unified storage and cross-modal search capabilities.
- Embedded Architecture: LanceDB runs as an embedded solution with lower latency and no network overhead
- Columnar Storage: Uses Apache Arrow and Lance format for efficient storage and retrieval
- Optimized Indexing: Advanced indexing algorithms specifically designed for high-dimensional vectors
As an embedded solution, LanceDB eliminates the need for separate database server infrastructure, making deployment and management significantly simpler.
Fluid Server Architecture
├── API Layer (FastAPI)
│ ├── /v1/embeddings # OpenAI-compatible embeddings
│ ├── /v1/embeddings/multimodal # Multimodal embedding support
│ └── /v1/vector_store/* # Vector storage operations
├── Embedding Manager
│ ├── Text Embeddings (OpenVINO)
│ ├── Image Embeddings (CLIP-based)
│ └── Audio Embeddings (Whisper-based)
└── LanceDB Storage Layer
├── Collections (Tables)
├── Vector Search Engine
└── Document Storage
models/
├── embeddings/
│ ├── sentence-transformers_all-MiniLM-L6-v2/ # Text models
│ ├── openai_clip-vit-base-patch32/ # Multimodal models
│ └── openai_whisper-base/ # Audio models
└── cache/ # Compiled model cache
LanceDB is automatically installed with Fluid Server:
# pyproject.toml
dependencies = [
"lancedb>=0.14.0",
"sentence-transformers>=2.2.0",
"pillow>=10.0.0",
]Enable embeddings in your server configuration:
# Server startup
config = ServerConfig(
enable_embeddings=True,
embedding_model="sentence-transformers/all-MiniLM-L6-v2",
multimodal_model="openai/clip-vit-base-patch32",
embedding_device="CPU", # or "GPU"
embeddings_db_path=Path("./data/embeddings"),
embeddings_db_name="vectors"
)curl -X POST "http://localhost:8080/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{
"input": ["Hello world", "Machine learning with Python"],
"model": "sentence-transformers/all-MiniLM-L6-v2"
}'curl -X POST "http://localhost:8080/v1/vector_store/insert" \
-H "Content-Type: application/json" \
-d '{
"collection": "documents",
"documents": [
{
"content": "LanceDB provides efficient vector storage",
"metadata": {"source": "documentation", "category": "database"}
},
{
"content": "Fluid Server enables AI model deployment on Windows",
"metadata": {"source": "readme", "category": "deployment"}
}
],
"model": "sentence-transformers/all-MiniLM-L6-v2"
}'curl -X POST "http://localhost:8080/v1/embeddings/multimodal" \
-F "input_type=image" \
-F "model=openai/clip-vit-base-patch32" \
-F "file=@image.jpg"curl -X POST "http://localhost:8080/v1/vector_store/search" \
-H "Content-Type: application/json" \
-d '{
"collection": "documents",
"query": "vector database performance",
"query_type": "text",
"limit": 5,
"model": "sentence-transformers/all-MiniLM-L6-v2"
}'curl -X POST "http://localhost:8080/v1/vector_store/search/multimodal" \
-F "collection=documents" \
-F "query_type=image" \
-F "limit=10" \
-F "model=openai/clip-vit-base-patch32" \
-F "file_query=@query_image.jpg"curl -X POST "http://localhost:8080/v1/vector_store/collections" \
-H "Content-Type: application/json" \
-d '{
"name": "my_collection",
"dimension": 384,
"content_type": "text",
"overwrite": false
}'curl -X GET "http://localhost:8080/v1/vector_store/collections"curl -X GET "http://localhost:8080/v1/vector_store/my_collection/stats"LanceDB supports SQL-like filtering expressions:
# Filter by metadata
results = await lancedb_client.search_vectors(
collection_name="documents",
query_vector=query_vector,
limit=10,
filter_condition="metadata->>'category' = 'technical'"
)# Batch insert
documents = [VectorDocument(...) for _ in range(1000)]
await lancedb_client.insert_documents("large_collection", documents)
# Batch embedding generation
texts = ["text " + str(i) for i in range(100)]
embeddings = await embedding_manager.get_text_embeddings(texts)The server automatically manages embedding model memory:
# Models are automatically loaded/unloaded based on usage
config = ServerConfig(
idle_timeout_minutes=30, # Unload models after 30 minutes of inactivity
max_memory_gb=8.0 # Maximum memory usage
)# Check available models
curl -X GET "http://localhost:8080/v1/embeddings/models"
# Verify collection status
curl -X GET "http://localhost:8080/v1/vector_store/collections"