Lean-Agent-4B

Scout Tier

Lightweight Tool Calling for Self-Hosted Agents

A 4B parameter distilled model built for fast, reliable tool calling and structured JSON output. The smallest model in the Lean lineup — fits comfortably on any modern GPU with under 5GB VRAM.

Key Features

✓ 4B parameters — runs on any GPU with 5GB+ VRAM
✓ Distilled from Qwen3-4B — optimized for agentic tasks via QLoRA fine-tuning
✓ LeanBench evaluated — benchmarked on real-world tool calling, JSON output, and multi-step reasoning
✓ Ollama-ready — ships with a tuned Modelfile for drop-in use
✓ Commercial use — no restrictions on how you deploy it

Performance

Benchmarks will be published with the first release.

Metric	Lean-Agent-4B	Qwen3-4B (baseline)
Tool calling accuracy	—	—
Structured output success	—	—
Avg latency	—	—
VRAM usage (Q8_0)	~4-5 GB	~4-5 GB

Pricing

One-time license: $20

Includes:

Model weights in GGUF format (Q4_K_M, Q5_K_M, Q6_K, and Q8_0 variants)
Tuned Ollama Modelfile
Documentation for Ollama and llama.cpp setup

Getting Started

Download the GGUF quantized weights
Load with Ollama or llama.cpp
Configure your agent to use lean-agent-4b

Also in the Scout tier: Lean-Agent-8B, Lean-Coder-8B, Lean-Agent-14B (coming soon)