Lean-Agent-4B
Scout Tier
Lightweight Tool Calling for Self-Hosted Agents
A 4B parameter distilled model built for fast, reliable tool calling and structured JSON output. The smallest model in the Lean lineup — fits comfortably on any modern GPU with under 5GB VRAM.
Key Features
- ✓ 4B parameters — runs on any GPU with 5GB+ VRAM
- ✓ Distilled from Qwen3-4B — optimized for agentic tasks via QLoRA fine-tuning
- ✓ LeanBench evaluated — benchmarked on real-world tool calling, JSON output, and multi-step reasoning
- ✓ Ollama-ready — ships with a tuned Modelfile for drop-in use
- ✓ Commercial use — no restrictions on how you deploy it
Performance
Benchmarks will be published with the first release.
| Metric | Lean-Agent-4B | Qwen3-4B (baseline) |
|---|---|---|
| Tool calling accuracy | — | — |
| Structured output success | — | — |
| Avg latency | — | — |
| VRAM usage (Q8_0) | ~4-5 GB | ~4-5 GB |
Pricing
One-time license: $20
Includes:
- Model weights in GGUF format (Q4_K_M, Q5_K_M, Q6_K, and Q8_0 variants)
- Tuned Ollama Modelfile
- Documentation for Ollama and llama.cpp setup
Getting Started
- Download the GGUF quantized weights
- Load with Ollama or llama.cpp
- Configure your agent to use
lean-agent-4b
Also in the Scout tier: Lean-Agent-8B, Lean-Coder-8B, Lean-Agent-14B (coming soon)