Models
Every model is too large for a single GPU. The lean runtime makes them runnable anyway.
lean-agent-35b
General-purpose agent — tool calling, structured output, multi-step reasoning
Total params
35B
Active per token
3B
Base model
Qwen3.5-35B-A3B
Min hardware
12 GB VRAM + 32 GB RAM
The entry point. 35B total parameters with 3B active per token — optimized for agent tasks, tool calling, and structured output on Minimal hardware. Qwen3.5 GDN hybrid architecture surpasses last-gen models 7x its size.
lean-coder-80b
Code generation — debugging, refactoring, code review
Total params
80B
Active per token
3B
Base model
Qwen3-Coder-Next
Min hardware
12 GB VRAM + 32 GB RAM
Code-specialized. 80B total parameters with 512 experts, only 3B active per token. Tuned for code generation, debugging, and software engineering workflows.
lean-agent-122b
Advanced agent — complex orchestration, long-context workflows
Total params
122B
Active per token
10B
Base model
Qwen3.5-122B-A10B
Min hardware
24 GB VRAM + 64 GB RAM
122B total with 256 experts per layer, 10B active per token. Massive knowledge base with efficient per-token compute — deep enough for complex multi-tool workflows.
lean-reason-397b
Frontier-scale — deep reasoning, complex analysis, research
Total params
397B
Active per token
17B
Base model
Qwen3.5-397B-A17B
Min hardware
48 GB VRAM + 128 GB RAM
Frontier-scale reasoning. 397B total parameters with 17B active per token delivers state-of-the-art capability — running entirely on your hardware. Requires expert offloading across VRAM, RAM, and NVMe.
How offloading works
MoE models only activate a fraction of their parameters per token. The router selects which experts to use, and only those experts need to be in VRAM. The rest stay in RAM or on your NVMe SSD, loaded on demand.
The .lmpack format stores each expert
as a separate file. The OS kernel manages caching automatically via mmap — frequently used experts stay
in RAM, cold experts page in from SSD when needed.
Core shared layers (attention, embeddings, router weights) are always in VRAM. They run every token and typically account for 15-25% of total model size.
Get started
$ curl -sSf https://leanmodels.ai/install.sh | sh
$ lean pull lean-agent-35b
$ lean run lean-agent-35b Download once, run forever. No cloud dependency.