Skip to content

Latest commit

 

History

History
50 lines (31 loc) · 2.07 KB

File metadata and controls

50 lines (31 loc) · 2.07 KB

Installing Ollama

LocalRAG talks to Ollama on your machine for local embeddings and chat. Ollama is a separate install—not a Python package. LocalRAG follows the Ollama HTTP API: embeddings use POST /api/embed (not the legacy /api/embeddings endpoint); chat uses POST /api/chat; model discovery uses GET /api/tags; pulls use POST /api/pull. Request and response bodies for those calls are typed in localrag/ollama/schemas.py. Use a reasonably current Ollama release so those routes match.

The canonical instructions are on the official site:

Follow the steps there for your OS (Windows, macOS, or Linux). The site covers installers, PATH, and optional GPU notes.

After installation

  1. Check the CLI (new terminal after install):

    ollama --version
  2. Run the server (LocalRAG expects it reachable, default http://127.0.0.1:11434):

    ollama serve

    On many setups the Ollama app starts this for you in the background; if localrag or the API cannot reach Ollama, run ollama serve explicitly.

  3. Pull models LocalRAG uses by default (names match .env.example):

    ollama pull nomic-embed-text
    ollama pull llama3.2

    You can change models via OLLAMA_EMBED_MODEL and OLLAMA_LLM_MODEL in .env.

  4. Optional: run LocalRAG’s helper to check connectivity and pull defaults:

    uv run localrag setup

Docker

If you use LocalRAG’s docker-compose.yml, Ollama runs in a container; pull models inside that container (see the README Docker section). You do not need a host install of Ollama for that path—only for native uv run localrag / local API usage.

More help