gguf
Here are 421 public repositories matching this topic...
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
-
Updated
Mar 26, 2026 - Rust
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
-
Updated
Mar 10, 2026 - TypeScript
动手学Ollama,CPU玩转大模型部署,在线阅读地址:https://datawhalechina.github.io/handy-ollama/
-
Updated
Jan 15, 2026 - Jupyter Notebook
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG
-
Updated
Mar 8, 2026 - Python
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
-
Updated
Mar 17, 2026 - TypeScript
Interface for OuteTTS models.
-
Updated
Mar 23, 2026 - Python
An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.
-
Updated
Mar 26, 2026 - Go
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Supports text-to-text, vision, text-to-image
-
Updated
Mar 28, 2026 - TypeScript
SOTA rounding-based quantization for high-accuracy low-bit LLM inference, seamlessly optimized for CPU, Intel GPU, and CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
-
Updated
Mar 28, 2026 - Python
A CLI to estimate inference memory requirements for Hugging Face models, written in Python.
-
Updated
Mar 21, 2026 - Python
Practical Llama 3 inference in Java
-
Updated
Feb 8, 2026 - Java
Go library for embedded vector search and semantic embeddings using llama.cpp
-
Updated
Mar 6, 2026 - Go
Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration.
-
Updated
Mar 24, 2026 - Go
Improve this page
Add a description, image, and links to the gguf topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gguf topic, visit your repo's landing page and select "manage topics."