🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
-
Updated
Oct 16, 2025 - C
🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
小遥搜索,听懂你的话、看懂你的图,用AI找到本地任何文件。让搜索像聊天一样简单。XiaoyaoSearch: Understands your words, reads your images, finds any local file with AI. Making search as easy as chatting.
Resource, examples & tutorials for multimodal AI, RAG and agents using vector search and LLMs
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
EVA OS — A real-time multimodal AIOS for next-generation hardware, enabling your devices being “alive” and as intelligent as a real brain.
This GitHub repository contains the complete code for building Business-Ready Generative AI Systems (GenAISys) from scratch. It guides you through architecting and implementing advanced AI controllers, intelligent agents, and dynamic RAG frameworks. The projects demonstrate practical applications across various domains.
🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目,支持在 Mac mini/MacBook 上或通过算力棒本地运行推理,实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理,支持复杂多步骤任务规划与执行。
On-device AI for iOS & Android
🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync
Hub for researchers exploring VLMs and Multimodal Learning:)
Reference-first AI image editing desktop for developers (macOS, Tauri, Rust).
A web app that dynamically generates playable 'Spot the Difference' games from a single text prompt using a multimodal pipeline with Google's Gemini and Imagen models.
Server-side video workflows for agents: ingest, understand, search, edit, stream.
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
[Nature Machine Intelligence] ImmunoStruct enables multimodal deep learning for immunogenicity prediction
ICML 2025 Papers: Dive into cutting-edge research from the premier machine learning conference. Stay current with breakthroughs in deep learning, generative AI, optimization, reinforcement learning, and beyond. Code implementations included. ⭐ support the future of machine learning research!
Neocortex Unity SDK for Smart NPCs and Virtual Assistants
Learn how multimodal AI merges text, image, and audio for smarter models
Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."