About cLLMHub

Our Mission

cLLMHub exists to democratize access to large language models. We believe that anyone with a GPU should be able to share their models with the world — without token-based billing, vendor lock-in, or complex infrastructure.

How It Works

cLLMHub turns any locally-hosted LLM into a production-ready, OpenAI-compatible API. Providers run the CLI alongside their model backend (Ollama, vLLM, llama.cpp, MLX, or any OpenAI-compatible server), and the hub handles discovery, routing, authentication, and monitoring. You can run models across multiple machines and access them all through a single API key — like a distributed inference system without the complexity.

Consumers get standard API keys and can use any OpenAI SDK or tool — just point the base URL at the hub and start making requests. No code changes beyond configuration.

Open Source

The cLLMHub CLI is open source under the Apache 2.0 license. We believe in transparency and community-driven development. You can inspect, audit, and contribute to the code on GitHub.

Simple Pricing

Unlike hosted inference platforms that charge per token, cLLMHub uses flat monthly pricing. You run the hardware, you control the costs. The free tier lets you get started with 3 models, 3 API keys, and 2,000 requests per day. The Pro tier at $3/month gives you 10 models, 10 API keys, and 20,000 requests per day. The Max tier at $20/month removes all limits.

Built For

Developers who want to self-host models and expose them as APIs. Teams that need a private inference gateway without sending data to third parties. Groups that want to pool hardware across machines or accounts using Hives. Mac users who want to run models on Apple Silicon via MLX. Anyone who wants OpenAI SDK compatibility without OpenAI.