Optimized LLMs for
Self-Hosted Agents

Task-specific distilled models that beat generic community quants at tool calling, structured output, and agentic reasoning — built for local inference on consumer GPUs.

Browse Models See Benchmarks

Agent-Tuned

Fine-tuned on 50K+ examples of tool calling, structured JSON output, and multi-step reasoning — the tasks your agents actually run.

Proven Benchmarks

Published results on standard academic evals AND LeanBench, our custom agent eval suite. No hand-waving — just numbers.

Runs on Your Hardware

Models from 5GB to 15GB VRAM. Scout tier fits on any modern GPU. Deep tier scales across multiple GPUs. GGUF format, Ollama-ready.

Benchmarks

Lean-Agent-8B matches base Qwen3-14B on agent tasks at half the VRAM. See the full LeanBench results.

Pricing

Models from $20. Bundles from $45. Subscription at $15/mo for everything + new releases.

Ready to run smarter agents?

Distilled from Qwen3. Built for agents. Your hardware.

View Pricing