Skip to content

PLENA — Quantization Evaluation Toolkit

PLENA evaluates MX-quantized large language models against a unified benchmark surface. One TOML config defines the quantization recipe; swap the evaluator to score the same model on perplexity, lm-eval harness tasks, code generation, agentic benchmarks, or diffusion-style decoding.

Where to start

  • Get started — install and run your first quantized perplexity number in five minutes.
  • Quantization configs — every field you can set in a TOML quantization recipe.
  • Evaluation commands — auto-generated reference for every CLI module, arg by arg.

Paper-table reproductions

Reproducible recipes for the headline results live under plena_experiments/ in the repo:

  • table5 — main quantization sweep across Llama-2/3 sizes and three bit configurations.
  • table6 — ablations isolating each technique's contribution.
  • table7 — downstream task accuracy using the best Table 5 recipes.

Each subdirectory contains the configs and run scripts for that table.

Installation

See README.md in the repo root for uv setup steps and optional dependency groups (docs, evalplus, serve, bfcl).