Models

Every model is too large for a single GPU. The lean runtime makes them runnable anyway.

lean-agent-35b

General-purpose agent — tool calling, structured output, multi-step reasoning

Free

Total params

35B

Active per token

3B

Base model

Qwen3.5-35B-A3B

Min hardware

12 GB VRAM + 32 GB RAM

The entry point. 35B total parameters with 3B active per token — optimized for agent tasks, tool calling, and structured output on Minimal hardware. Qwen3.5 GDN hybrid architecture surpasses last-gen models 7x its size.

lean-coder-80b

Code generation — debugging, refactoring, code review

Free

Total params

80B

Active per token

3B

Base model

Qwen3-Coder-Next

Min hardware

12 GB VRAM + 32 GB RAM

Code-specialized. 80B total parameters with 512 experts, only 3B active per token. Tuned for code generation, debugging, and software engineering workflows.

lean-agent-122b

Advanced agent — complex orchestration, long-context workflows

Paid

Total params

122B

Active per token

10B

Base model

Qwen3.5-122B-A10B

Min hardware

24 GB VRAM + 64 GB RAM

122B total with 256 experts per layer, 10B active per token. Massive knowledge base with efficient per-token compute — deep enough for complex multi-tool workflows.

lean-reason-397b

Frontier-scale — deep reasoning, complex analysis, research

Paid

Total params

397B

Active per token

17B

Base model

Qwen3.5-397B-A17B

Min hardware

48 GB VRAM + 128 GB RAM

Frontier-scale reasoning. 397B total parameters with 17B active per token delivers state-of-the-art capability — running entirely on your hardware. Requires expert offloading across VRAM, RAM, and NVMe.

How offloading works

MoE models only activate a fraction of their parameters per token. The router selects which experts to use, and only those experts need to be in VRAM. The rest stay in RAM or on your NVMe SSD, loaded on demand.

The .lmpack format stores each expert as a separate file. The OS kernel manages caching automatically via mmap — frequently used experts stay in RAM, cold experts page in from SSD when needed.

Core shared layers (attention, embeddings, router weights) are always in VRAM. They run every token and typically account for 15-25% of total model size.

Get started

$ curl -sSf https://leanmodels.ai/install.sh | sh
$ lean pull lean-agent-35b
$ lean run lean-agent-35b

Download once, run forever. No cloud dependency.