What is cLLMHub?
Overview
cLLMHub turns any local LLM into a production-ready, OpenAI-compatible API. If you have a GPU (or even a CPU) running a model through Ollama, vLLM, llama.cpp, or MLX, cLLMHub lets you publish that model to a shared hub so anyone with an API key can use it — from anywhere. You can also download GGUF models directly from Hugging Face and run them with the built-in daemon.
Why cLLMHub?
Most hosted inference platforms charge per token and lock you into their infrastructure. cLLMHub is different: you own the hardware, you own the model, and you decide who gets access. There is no token-based billing — pricing is a flat monthly fee based on how many models and keys you need, not how many tokens you produce.
Who is it for?
Developers who want to self-host models and expose them as APIs. Teams that need a private inference gateway without sending data to third parties. Hobbyists who want to share a model with friends. Anyone who wants OpenAI SDK compatibility without OpenAI.
How it works
1. You run the cllmhub CLI on your machine — either with a downloaded GGUF model (via the built-in daemon) or alongside an external backend (Ollama, vLLM, MLX, etc.). 2. The CLI registers your model with the hub and keeps a live connection. 3. Consumers create API keys scoped to your model and call the standard /v1/chat/completions endpoint. 4. The hub routes the request to your machine, your backend generates the response, and it streams back to the consumer.