Random Bitflip
This is tutorial on how to run post-bitflip evaluation on a pretrained checkpoint, and how to run a bitflip-aware pretraining from scratch.
Overview
- Post-bitflip evaluation loads a pretrained checkpoint from HuggingFace, applies bitflip transformation to the model, and evaluates the model on downstream tasks.
- The entry point is at
experiments/llm-bitflip/transform/minimal.py
.
- The entry point is at
- Bitflip-aware pretraining creates a randomly initialized model, applies bitflip transformation to the model, and pretrains the model on FineWeb-Edu.
- The entry point is at
experiments/llm-bitflip/pretrain/run.py
.
- The entry point is at
- To accelerate the emulation, we build custom triton kernels in
mase-triton
.- The random bitflip kernel is wrapped in function
mase_triton.random_bitflip.core.random_bitflip_fn
, which supports unique bitflip probability for the sign-exp bits and the mantissa bits.random_bitflip_fn
also supports zeroing out outliers (and "NaN" values) by assigning a threshold. - The random bitflip probability only supports a power of 0.5, e.g,
0.5
,0.5^2
,0.5^3
, etc. The kernel will automatically convert the probability to the nearest power of 0.5. Due to the limitation of the pseudo random number generation algorithm (Philox), the kernel only works for a random bitflip probability greater or equal to0.5^-24=5.96-08
.
- The random bitflip kernel is wrapped in function
Evaluation of Post-Training Bitflip Transform
Environment Setup?
If you have not set up environments, please follow the guidelines in Environment Setup.
We offer minimal scripts to apply post-training bitflip transform on all linear layers (contributing to over 90% FLOPS in Transformers) in a HuggingFace pretrained model and evaluate the transformed model with lm-eval-harness
.
Transform & Evaluate on Downstream Tasks
cd experiments/llm-bitflip/transform
model_name="unsloth/Meta-Llama-3.1-8B-Instruct" # HuggingFace model name
x_p_exp=null # bitflip probability for the sign-exp bits of the activation. Null means no bitflip.
w_p_exp=null # bitflip probability for the sign-exp bits of the weight. Null means no bitflip.
x_zero_out_t="100" # threshold for zeroing out outliers (and "NaN" values) of the activation
w_zero_out_t="1.25" # threshold for zeroing out outliers (and "NaN" values) of the weight
x_p_frac=$(bc <<< "scale=10; 0.5^10") # bitflip probability for the mantissa bits of the activation
w_p_frac=$(bc <<< "scale=10; 0.5^10") # bitflip probability for the mantissa bits of the weight
python minimal.py eval-bitflip \
--model_name ${model_name} \
--bitflip_config "default" \
--default_bitflip_config.x_p_exp=${x_p_exp} \
--default_bitflip_config.x_p_frac=${x_p_frac} \
--default_bitflip_config.x_zero_out_t=${x_zero_out_t} \
--default_bitflip_config.w_p_exp=${w_p_exp} \
--default_bitflip_config.w_p_frac=${w_p_frac} \
--default_bitflip_config.w_zero_out_t=${w_zero_out_t} \
--tasks ['wikitext']
eval-bitflip
This eval-bitflip
subcommand also uses lm-eval-harness
's simple_evaluate
function. Please refer to the evaluation section of LLM Pretraining & Evaluation for more details.
Evaluate the Original Model
You may want to compare the evaluation results of the bitflip model with the original model. You can do this by running the following command:
Test the Generation
We also offer a hf-gen
script to
Simple Generation
We provide a simple generation subcommand hf-gen
as well.
prompt="London is"
max_new_tokens="100"
do_sample="true"
temperature="0.6"
top_k="50"
top_p="0.9"
python minimal.py hf-gen \
AICrossSim/clm-60m \
${prompt} \
--max_new_tokens ${max_new_tokens} \
--do_sample ${do_sample} \
--temperature ${temperature} \
--top_k ${top_k} \
--top_p ${top_p} \
--bitflip_config "default" \
--default_bitflip_config.x_p_exp=${x_p_exp} \
--default_bitflip_config.x_p_frac=${x_p_frac} \
--default_bitflip_config.x_zero_out_t=${x_zero_out_t} \
--default_bitflip_config.w_p_exp=${w_p_exp} \
--default_bitflip_config.w_p_frac=${w_p_frac} \
--default_bitflip_config.w_zero_out_t=${w_zero_out_t}
Our Initial Experiments
For our AICrossSim/clm-1.1b
, we sweep x_p_frac
and w_p_frac
and observe how perplexity and generated texts changes.
Here are some samples of the generated texts: link
Notably, when the perplexity is increase by 1%, the generated text are consistent with the original text.
Bitflip-Aware Pretraining
Similar to the pretraining of script of AICrossSim-CLM (experiments/llm-bitflip/pretrain/run.py
), we offer a experiments/llm-bitflip/pretrain/run.py
script to run bitflip-aware pretraining from scratch. The subcommands accepts the same arguments as experiments/llm-bitflip/pretrain/run.py
, but with an additional argument for bitflip transform.
-
For example, we can run the following command to run a bitflip-aware pretraining for
AICrossSim-CLM-60M
on 2 H100 96GB GPUs.-
Generate a training config with bitflip transform config.
cd experiments/llm-bitflip/pretrain bitflip_transform_config="./configs/meta/fc-only-w-a-exp-frac.yaml" python run.py generate-cfg \ ${bitflip_transform_config} \ --model_arch "aixsim" \ --model_flavor "60M" \ --batch_size 48 \ --data_parallel_replicate_degree 2\ --data_parallel_shard_degree -1 \ --token_num_scale 22 \ --compile "false" \ --save_path "./configs/tutorial-60m.yaml"
-
Run the pretraining with the generated config.
num_gpus="2" PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" STREAM_HF_DATA="1" \ torchrun --nproc_per_node=${num_gpus} --rdzv_backend c10d \ --rdzv_endpoint="localhost:0" --local-ranks-filter 0 \ --role rank --tee 3 \ run.py pretrain \ --config configs/tutorial-60m.yaml \ --metrics_args.enable_wandb false
-
Convert the checkpoint to HuggingFace format.
Our Bitflip-Aware Training Results of AICrossSim-CLM-60M
We performed bitflip-aware pretraining on
AICrossSim-CLM-60M
on 2 H100 96GB GPUs for 2.5 hours.- Train config:
experiments/llm-bitflip/pretrain/configs/aixsim-60M.yaml
- Wandb logs: link
- HuggingFace checkpoint: AICrossSim/bitflip-fc-clm-60m
-
Similarly, one can run bitflip-aware pretraining for other AICrossSim-CLM model sizes. Here we summarize our current bitflip-aware pretraining progress.
Model Environment Pretraining Time Training Config Wandb Logs HuggingFace Checkpoint 60M 2x H100 96GB 2.5 hours configs/aixsim-60M.yaml
link AICrossSim/bitflip-fc-clm-60m 200M 2x H100 96GB 14.3 hours configs/aixsim-200M.yaml
link AICrossSim/bitflip-fc-clm-200m 400M 6x A6000 48GB 33 hours configs/aixsim-400M.yaml
link AICrossSim/bitflip-fc-clm-400m 1.1B 8x H200 141GB 51 hours configs/aixsim-1.1B.yaml
link AICrossSim/bitflip-fc-clm-1.1b
-