Processing-in-Memory on RoBERTa#
This tutorial demonstrates PIM-aware fine-tuning of RoBERTa for sequence classification. The simulation supports SRAM (digital), RRAM (analogue), and PCM (analogue) technologies.
Note
If you have not set up the environment yet, follow Installation first.
Overview#
PIM-aware fine-tuning — applies PIM transformation (noise injection and quantization) to RoBERTa and fine-tunes on GLUE benchmark tasks.
Entry point: experiments/roberta-pim/run_glue.py
Supported technologies:
SRAM (Digital) — simulates digital CIM with FP8 quantization.
RRAM (Analogue) — models analog non-idealities such as device variation and noise.
PCM (Analogue) — models programming noise, read noise, IR drop, and output noise.
Note
The PIM simulation uses quantization primitives from the
MASE submodule
(e.g., scale_integer_quantizer for integer quantization with dynamic scaling).
Ensure MASE is installed: pip install -e ./submodules/mase.
PIM Configuration#
PIM behaviour is controlled through YAML configuration files.
Example configs are in experiments/llm-pim/configs/:
sram.yaml— Digital PIM (FP8)reram.yaml— Analogue RRAMpcm.yaml— Analogue PCM
SRAM (digital, FP8) — example#
# experiments/llm-pim/configs/sram.yaml
by: "type"
linear:
config:
tile_type: "digital"
core_size: 64
rescale_dim: "vector"
x_quant_type: "e4m3"
weight_quant_type: "e4m3"
Configuration parameters#
Common:
tile_type— compute tile type:"digital"(SRAM),"reram", or"pcm"core_size— size of the simulation tile core (default: 64)
SRAM-specific:
rescale_dim— granularity of the pre-alignment mechanism:"vector"or"matrix"x_quant_type— quantization format for input activations (e.g.,"e4m3","e5m2")weight_quant_type— quantization format for weights (same convention asx_quant_type)
ReRAM-specific:
noise_magnitude— magnitude of the Gaussian distribution modeling analogue non-idealitiesnum_bits— weight resolution (number of bits) in ReRAM cells
PCM-specific:
programming_noise— simulates programming errors and device-to-device variabilityread_noise— models temporal read noise and 1/f noise during inferenceir_drop— simulates voltage drop across the crossbar interconnectsout_noise— models additive system noise in the peripheral circuits
Fine-Tuning RoBERTa with PIM#
TASK_NAME="mrpc"
MODEL_NAME="FacebookAI/roberta-base"
PIM_CONFIG="./experiments/llm-pim/configs/sram.yaml"
python experiments/roberta-pim/run_glue.py \
--model_name_or_path ${MODEL_NAME} \
--task_name ${TASK_NAME} \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--output_dir ./output/${TASK_NAME}_pim \
--pim_config_path ${PIM_CONFIG} \
--pim
Results#
Post-training transform (PIM applied to a trained RoBERTa, no fine-tuning)#
Model |
MNLI |
QNLI |
RTE |
SST |
MRPC |
CoLA |
QQP |
STSB |
Avg |
|---|---|---|---|---|---|---|---|---|---|
Original |
0.8728 |
0.9244 |
0.7978 |
0.9357 |
0.9019 |
0.6232 |
0.9153 |
0.9089 |
0.8600 |
Random |
0.3266 |
0.4946 |
0.5271 |
0.4908 |
0.3162 |
0.0000 |
0.6318 |
0.0332 |
0.3525 |
ReRAM (analogue) |
0.3239 |
0.4860 |
0.5090 |
0.4839 |
0.4338 |
−0.0363 |
0.5796 |
−0.0011 |
0.3475 |
PCM (analogue) |
0.3211 |
0.5123 |
0.5090 |
0.5068 |
0.5098 |
0.0443 |
0.6162 |
0.0745 |
0.3872 |
SRAM (digital, FP8) |
0.7825 |
0.4939 |
0.5271 |
0.5092 |
0.3186 |
0.0000 |
0.6318 |
0.0198 |
0.3523 |
PIM-aware fine-tuning#
Model |
MNLI |
QNLI |
RTE |
SST |
MRPC |
CoLA |
QQP |
STSB |
Avg |
|---|---|---|---|---|---|---|---|---|---|
Original |
0.8728 |
0.9244 |
0.7978 |
0.9357 |
0.9019 |
0.6232 |
0.9153 |
0.9089 |
0.8600 |
Random |
0.3266 |
0.4946 |
0.5271 |
0.4908 |
0.3162 |
0.0000 |
0.6318 |
0.0332 |
0.3525 |
RRAM (analogue) |
0.8416 |
0.8669 |
0.5451 |
0.8761 |
0.6961 |
0.0000 |
0.9052 |
0.3611 |
0.7228 |
PCM (analogue) |
0.6850 |
0.7681 |
0.4729 |
0.8383 |
0.6838 |
0.0000 |
0.8585 |
−0.1037 |
0.6108 |
SRAM (digital, FP8) |
0.8589 |
0.9129 |
0.6643 |
0.9220 |
0.8505 |
0.5113 |
0.9128 |
0.8815 |
0.8143 |