Task-specific distilled models that beat generic community quants at tool calling, structured output, and agentic reasoning — built for local inference on consumer GPUs.
Fine-tuned on 50K+ examples of tool calling, structured JSON output, and multi-step reasoning — the tasks your agents actually run.
Published results on standard academic evals AND LeanBench, our custom agent eval suite. No hand-waving — just numbers.
Models from 5GB to 15GB VRAM. Scout tier fits on any modern GPU. Deep tier scales across multiple GPUs. GGUF format, Ollama-ready.