LocalRAG talks to Ollama on your machine for local embeddings and chat. Ollama is a separate install—not a Python package. LocalRAG follows the Ollama HTTP API: embeddings use POST /api/embed (not the legacy /api/embeddings endpoint); chat uses POST /api/chat; model discovery uses GET /api/tags; pulls use POST /api/pull. Request and response bodies for those calls are typed in localrag/ollama/schemas.py. Use a reasonably current Ollama release so those routes match.
The canonical instructions are on the official site:
- Home & docs: ollama.com
- Download / install: ollama.com/download
Follow the steps there for your OS (Windows, macOS, or Linux). The site covers installers, PATH, and optional GPU notes.
-
Check the CLI (new terminal after install):
ollama --version
-
Run the server (LocalRAG expects it reachable, default
http://127.0.0.1:11434):ollama serve
On many setups the Ollama app starts this for you in the background; if
localragor the API cannot reach Ollama, runollama serveexplicitly. -
Pull models LocalRAG uses by default (names match
.env.example):ollama pull nomic-embed-text ollama pull llama3.2
You can change models via
OLLAMA_EMBED_MODELandOLLAMA_LLM_MODELin.env. -
Optional: run LocalRAG’s helper to check connectivity and pull defaults:
uv run localrag setup
If you use LocalRAG’s docker-compose.yml, Ollama runs in a container; pull models inside that container (see the README Docker section). You do not need a host install of Ollama for that path—only for native uv run localrag / local API usage.
- Library & API details: github.com/ollama/ollama
- Model list: ollama.com/library