Skip to content

Get started

Five minutes from clone to your first quantized perplexity number.

1. Install

uv venv                    # Python ≥ 3.11.9
source .venv/bin/activate
uv sync                    # core deps — mase, fast-hadamard-transform, lm-eval, ...

The lockfile pins mase to the releases/plena-experiments branch and pulls fast-hadamard-transform transitively. See the repo README.md for optional extras (docs, evalplus, serve, bfcl).

2. Run your first evaluation

Quantize a Llama-3.2-1B decoder to MXFP4 and measure WikiText perplexity. No calibration, no rotation — just the simplest possible end-to-end check that PLENA works on your machine.

cat > /tmp/quickstart.toml <<'TOML'
by = "regex_name"

["model\\.layers\\.\\d+\\.self_attn\\.(q|k|v|o)_proj"]
name = "mxfp"
weight_block_size = 32
weight_exponent_width = 2
weight_frac_width = 1
data_in_block_size = 32
data_in_exponent_width = 2
data_in_frac_width = 1

["model\\.layers\\.\\d+\\.mlp\\.(gate|up|down)_proj"]
name = "mxfp"
weight_block_size = 32
weight_exponent_width = 2
weight_frac_width = 1
data_in_block_size = 32
data_in_exponent_width = 2
data_in_frac_width = 1
TOML

python -m quant_eval.cli.eval_ppl \
    --model_name unsloth/Llama-3.2-1B \
    --quant_config /tmp/quickstart.toml \
    --device_id cuda:0

Runs in about a minute on a single GPU. The output ends with ppl: … — that number is your model's WikiText perplexity under MXFP4 weight + activation quantization.

3. Next steps

  • Quantization configs — every field you can set in a TOML config: linear quantization, composite attention, [gptq], [rotation_search].
  • Evaluation commands — every CLI module with all its flags.

Paper reproductions

The plena_experiments/ directory in the repo contains config + script bundles that reproduce each headline result table:

  • plena_experiments/table5/ — main quantization sweep (Llama-2/3 × 3 bit configs).
  • plena_experiments/table6/ — component-level ablations.
  • plena_experiments/table7/ — downstream task accuracy.

Each subdirectory has runnable shell scripts that drive the CLIs above.