Models
MoE models from 35B to 398B parameters. Run models larger than your VRAM - expert offloading handles the rest.
Featured Model
lean-think-398b
Arcee Trinity-Large-Thinking - 398B parameters, ~13B active per token. Chain-of-thought reasoning with agentic RL post-training. Apache 2.0 license.
Q4_K_M download
241.9 GB
Min VRAM
48 GB
Active per token
~13B
Architecture
afmoe
A 242 GB model running on 48 GB VRAM. 256 experts per MoE layer, 4 active + 1 shared per token. Interleaved sliding window + global attention. The model you can't run without expert offloading.
lean-agent-35b
General-purpose agent - tool calling, structured output, multi-step reasoning
Total params
35B
Active per token
3B
Base model
Qwen3.5-35B-A3B
Architecture
GDN hybrid MoE
Min VRAM
12 GB
GGUF download sizes
The entry point. A 21 GB model (Q4_K_M) that runs on 12 GB VRAM - expert offloading handles the rest. Qwen3.5 GDN hybrid architecture surpasses last-gen models 7× its size. 6.7-7.6 tok/s decode on an RTX 3090.
$ lean pull lean-agent-35b lean-coder-80b
Code generation - debugging, refactoring, code review
Total params
80B
Active per token
3B
Base model
Qwen3-Coder-Next
Architecture
MoE (512 experts)
Min VRAM
12 GB
GGUF download sizes
Code-specialized. 80B total with 512 experts, only 3B active per token. A 48.7 GB model (Q4_K_M) that runs on 12 GB VRAM. Tuned for code generation, debugging, and software engineering workflows.
$ lean pull lean-coder-80b lean-agent-122b
Advanced agent - complex orchestration, long-context workflows
Total params
122B
Active per token
10B
Base model
Qwen3.5-122B-A10B
Architecture
GDN hybrid MoE
Min VRAM
24 GB
GGUF download sizes
A 75 GB model (Q4_K_M) that runs on 24 GB VRAM. 256 experts per layer with 10B active per token - massive knowledge base with efficient per-token compute. 2.3 tok/s decode on an RTX 3090.
$ lean pull lean-agent-122b lean-reason-397b
Frontier-scale - deep reasoning, complex analysis, research
Total params
397B
Active per token
17B
Base model
Qwen3.5-397B-A17B
Architecture
GDN hybrid MoE
Min VRAM
48 GB
GGUF download sizes
Frontier-scale reasoning. 397B total parameters with 17B active per token delivers state-of-the-art capability running entirely on your hardware. A 244 GB model (Q4_K_M) that runs on 48 GB VRAM.
$ lean pull lean-reason-397b lean-think-398b
Extended reasoning - chain-of-thought, agentic tasks, deep analysis
Total params
398B
Active per token
~13B
Base model
Arcee Trinity-Large-Thinking
Architecture
afmoe (SWA + global)
Min VRAM
48 GB
GGUF download sizes
A 242 GB model (Q4_K_M) that runs on 48 GB VRAM. 256 experts per MoE layer with 4 active + 1 shared per token. Interleaved sliding window + global attention architecture. Chain-of-thought reasoning with agentic RL post-training. Apache 2.0 license. The "you can't run this without expert offloading" model.
$ lean pull lean-think-398b How offloading works
MoE models only activate a fraction of their parameters per token. The lean runtime keeps the hot path in VRAM and transparently pages in the rest from RAM and NVMe as needed.
The .lmpack format is designed
for this workload. Combined with speculative prefetching and profile-guided preloading,
it delivers interactive speeds on hardware that would otherwise be far too small.
Get started
$ curl -sSf https://leanmodels.ai/install.sh | sh
$ lean pull lean-agent-35b
$ lean run lean-agent-35b Single binary, 15 MB. No Python, no Docker, no cloud dependency.