PLENA — Quantization Evaluation Toolkit¶
PLENA evaluates MX-quantized large language models against a unified benchmark surface. One TOML config defines the quantization recipe; swap the evaluator to score the same model on perplexity, lm-eval harness tasks, code generation, agentic benchmarks, or diffusion-style decoding.
Where to start¶
- Get started — install and run your first quantized perplexity number in five minutes.
- Quantization configs — every field you can set in a TOML quantization recipe.
- Evaluation commands — auto-generated reference for every CLI module, arg by arg.
Paper-table reproductions¶
Reproducible recipes for the headline results live under
plena_experiments/ in the repo:
- table5 — main quantization sweep across Llama-2/3 sizes and three bit configurations.
- table6 — ablations isolating each technique's contribution.
- table7 — downstream task accuracy using the best Table 5 recipes.
Each subdirectory contains the configs and run scripts for that table.
Installation¶
See README.md in the repo root for uv setup steps and optional
dependency groups (docs, evalplus, serve, bfcl).