New OMM v1.0.0 is Live

The GUI AI model manager for your local llama.cpp engine.

A fast, lightweight orchestrator built exclusively for the llama.cpp engine. OMM automates model lifecycles and VRAM unloading, exposing a local, OpenAI-compatible API so your favorite chat GUIs can talk directly to llama.cpp.

Download for Mac, Linux & Windows Github

;

Built for performance, not packaging.

Everything you need to manage local inference, without the bloat.

Zero Middleware

Unlike tools that wrap and obscure the engine, OMM orchestrates raw llama.cpp binaries for maximum tokens-per-second.

Auto-VRAM Unloading

Stop manually killing processes. OMM monitors idle times and intelligently unloads models from VRAM to free up your GPU for other tasks.

Universal Hardware

We don't mess with your drivers. If your system (Windows, Mac, or Linux) can run llama.cpp, OMM can orchestrate it.

Drop-In Replacement

Speaks OpenAI natively.

Just set your base URL to http://localhost:8082/v1. OMM handles the rest, lazy-loading models into VRAM the second you hit send. Connect instantly to your favorite frontends and break free from bloated LLM apps and paid subscriptions.

Open WebUI

SillyTavern

Anything OpenAI Compatible